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The Contribution of Interview and Situational Performance 
Procedures to the Selection of Supervisory Personnel * 


Robert Glaser, Paul A. Schwarz, and John C. Flanagan 


University of Pittsburgh and the American Institute for Research 


This article presents the results of a study 
concerned with the construction and valida- 
tion of interview and situational performance 
procedures for the selection of supervisory 
personnel. The study was designed primarily 
to determine the unique contribution of such 
procedures when the effects of paper and 
pencil tests and other identifiable predictor 
variables are controlled. 


Method 


Design of the study. Two groups, of 40 super- 
visors each, were selected from 227 civilian super- 
visors employed at two large military depots. These 
groups were selected so that they would have the 
following characteristics: (a) they would differ as 
much as possible on criterion scores of supervisory 
effectiveness and (b) they would be as similar as 
possible on a set of control variables known to be 
predictive of supervisory performance. 

The experimental interview and situational per- 
formance tests were then administered to these 
groups. Because the groups were matched on known 
predictor variables and differed on the criterion of 
job success, it was expected that any differences ob- 
tained on the experimental instruments would re- 
flect primarily the unique contributions of these pro- 
cedures to the identification of high and low criterion 
individuals. 

Criterion instruments. Criterion data were col- 
jected by means of three instruments. 

1. A Supervisor Performance Report.2. This is a 
preferred-choice rating form developed by the Per- 
sonnel Research Branch, which is made up of groups 


1A project sponsored by the Personnel Research 
Branch of the Adjutant General’s Office, Department 
of the Army. Portions of this article were presented 
in a paper at the meetings of the American Psy- 
chological Association in Chicago during September 
1956. 

2 This form is part of the present Army Civilian 
Supervisory Selection Battery and was made avail- 
able by the Department of the Army for this re- 
search. 


of statements descriptive of job behavior. On each 
item, the evaluator selects one statement “most de- 
scriptive” of the S’s job performance and one state- 
ment “least descriptive” of it. This report was com- 
pleted by several raters who were familiar with the 
S’s day-to-day performance. 

2. Ratings of Supervisor Effectiveness. This is a 
set of three rating scales concerned with . different 
aspects of supervisory ability. On Scales 1 and 2, 
the evaluator compares the subject to descriptions of 
four sample supervisors representing four degrees of 
effectiveness. On Scale 3, a supervisor is compared 
to the “ten best” and “ten poorest” supervisors the 
rater has observed in the course of his job experi- 
ence. This rating form was completed by raters 
that had only a general knowledge of the S’s per- 
formance and reputation, and also by his immediate 
supervisor. 

3. The Performance Record (2). This is a day- 
by-day record form of specific actions descriptive 
of effective and ineffective supervisor performance. 
This daily record was maintained over a three-month 
period by each S’s immediate superior. 

A single index of supervisor effectiveness was com- 
puted for each S by combining the scores obtained 
on these criterion instruments. 

Matching (control) variables. In obtaining con- 
trol data for the matching of High and Low cri- 
terion groups, an attempt was made to consider as 
many as possible of the variables known or likely 
to be related to supervisory effectiveness. These in- 
cluded the following: 

Score on a test of basic ability. This is a paper 
and pencil test that is one of the components of the 
present supervisor selection battery developed by the 
Personnel Research Branch. It consists of items on 
verbal meaning, numerical facility, and spatial visu- 
alization. 

Score on a test of supervisory practices. This is 
another paper and pencil component of the present 
battery. It requires judgments of appropriate ac- 
tion in hypothetical problem situations similar to 
those a supervisor would face on the job. Four or 
five alternate ways of dealing with each situation 
are presented, and the supervisor is asked to choose 
the one worst and one best solution. 
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Table 1 
Criterion and Control Scores for High and Low Groups 








Depot A Depot B 





Mean SD Mean SD 





Composite Criterion 


High Group 155 
Low Group 50 


11.6 
24.6 


16.5 
28.8 
Basic Ability 
High Group 30 7.7 6.6 
Low Group 29 6.8 5.3 
Supervisory Practices 
High Group 18 4.2 4.0 
Low Group 17 4.2 44 
Age 
High Group 
Low Group 


8.1 10.1 
7.6 9.4 


40 
40 


Years as Supervisor 
High Group 5.3 2.9 8° 633 
Low Group 6.0 2.2 8 3.7 
Job Grade 


High Group 66 2.9 
Low Group 5.4 2.8 





Age. 

Number of years as a supervisor. 

Present job grade or level. 

Sex and race were also controlled by using similar 
distributions in the High and Low criterion groups. 

Selection of the groups for study. The selection 
of matched High and Low criterion groups was car- 
ried out independently at the two depots on the ba- 
sis of the criterion and control information described 
above. Insofar as possible, an attempt was made to 
match on a man-to-man basis so that the variances 
and the means of the resulting samples would be 
comparable. The results of this procedure are 
shown in Table 1. 
close matching of High and Low groups was pos- 
sible. 

Test development and administration. On the ba- 
sis of detailed “test rationales” specifying the super- 
visor functions to be predicted (1, 2), five predictor 
instruments were constructed: two interview pro- 
cedures and three situational performance tests. The 
major characteristics of these procedures may be 
summarized as follows: 


8 Matching of test scores was based on estimated 
or, regressed “true” scores. An individual’s “true” 
score was computed employing the reliability of a 
test for the preliminary High and Low subgroup 
(N = 40) to which he belonged. 


As this table indicates, a fairly 


A Standardized Panel Interview. This was con- 
ducted as an informal discussion between the candi- 
date and a panel of three interviewers. Topics and 
probing questions related to supervisory performance 
and attitudes were introduced into the discussion ac- 
cording to a pre-arranged schedule. At the end of 
the interview, each interviewer independently com- 
pleted ratings on the candidate’s personal character- 
istics and attitudes, and on particular aspects of the 
candidate’s responses. 

A Standardized Individual Interview. This was 
administered and scored just like the panel inter- 
view, but was conducted by only one interviewer. 

A: Group Discussion Problem. This was set up as 
a committee meeting of four candidates responsible 
for developing recommendations on a particular as- 
pect of plant management. An observer evaluated 
the performance of each candidate in terms of (a) 
a checklist of specific discussion behaviors and (b) 
ratings based on the candidate’s contributions dur- 
ing the discussion. The task of the examiner was 
simplified here by the use of a “time-sampling” pro- 
cedure, in which the discussion was divided into 
discrete observation periods. Within each of these 
periods only the presence or absence (rather than 
the frequency of occurrence) of each checklist be- 
havior was recorded. 

A Role-Playing Situation. Here the candidate was 
required to deal with a “staged” personnel problem 
as he would deal. with it in an actual job situation. 
An assistant examiner played the role of the sub- 
ordinate involved in the problem and _ interacted 
with the candidate in a relatively standardized man- 
ner. The examiner recorded specific aspects of the 
candidate’s performance on a checklist of effective 
and ineffective behaviors. 

A Small-Job Management Problem. This involved 
the utilization of personnel and materials in a minia- 
ture work situation. The candidate was required to 
train subordinates, organize the work flow, and 
monitor job activities. He was scored by an ob- 
server both on a checklist of effective and ineffec- 
tive supervisory actions and in terms of his actual 
work output. 

This experimental test battery was administered to 
the selected sample of 40 supervisors—20 High and 
20 Low—at each depot, making a total sample of 
80. Each candidate was tested on two forms of the 
tests with the exception of the Small-Job Manage- 
ment Problem for which replication was not feasible 
for administrative reasons. In order to eliminate 
the possibilities of bias resulting from prior knowl- 
edge of the candidates’ capabilities and reputations, 
each depot supplied the examiners for the testing of 
the other installation. 


Analysis of Results 


Analysis of the results of this study is 
based upon statistical procedures appropriate 
to the analysis of data from extreme groups. 
Since estimates of predictor variances com- 
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puted directly from extreme groups would 
yield overestimates, the computation of these 
variances was carried out by the procedure 
recommended by Peters (3) and Peters and 
Van Voorhis (4). This procedure was used 
for all computations requiring an estimate of 
the predictor variance. Since criterion scores 
had been obtained for the entire sample from 
which the extreme groups were drawn, the 
value of the criterion variance was computed 
directly. 

The analysis of results is concerned with 
three aspects: test reliabilities, validities of 
the single predictors, and validities of com- 
posite predictors. 


Predictor Reliabilities 


Alternate form reliability coefficients were 
computed for all predictors except the Small- 
Job Management Problem, only one form of 
which had been administered. Table 2 shows 
these results. This table shows the alternate 
form reliabilities obtained at each depot and 
then the average of the two. 

These results indicate that the Group Dis- 


Table 2 


Alternate Form Reliability Coefficients 


Two- 
Depot Depot Depot 
A B Average* 





Standardized PanelInterview  .77 A7 65 
Individual Interview 7 .29 53 
Group Discussion Problem .66 .80 74 
Role-Playing Situation AY 18 34 





* Averaged by r-to-s transformation. 


cussion Problem in general is the most reli- 
able of the predictors. The reliability of the 
interviews is somewhat lower, with the Panel 
Interview consistently superior to the Indi- 
vidual Interview. The reliability of the Role- 
Playing Situation is the lowest throughout. 


Validity Estimates for Individual Predictors 


Two indices of validity were computed for 
each predictor instrument: a ¢ ratio, repre- 
senting the significance of the difference be- 
tween the mean scores of the High and Low 


Table 3 


Significance of High and Low Group Mean Differences (¢ ratios) on Individual Predictors 





Depot A 
(Nua = Ni = 20) 


Depot A + Depot B 
(Ny=N,=40) 


Depot B 
Nu=N1.=20) 





Panel Interview 
Form I 
Form IT 
Form I + II 
Individual Interview 
Form I 
Form IT 
Form I + II 
Group Discussion Problem 
Form I 
Form II 
Form I + II 
Role-Playing Situation 
Form I 


Form IT 
Form I + II 


Small Job Management 


1.3 
ag 
1.6 


1,7* 
2.0°* 
2.0°* 


6 
3.0*** 
2.0°* 





* Significant at approximately the 10% level. 
** Significant at approximately the 5% level. 
*** Significant at approximately the 1% level. 








R. Glaser, P. A. Schwarz, and J. C. Flanagan 


Table 4 


Biserial r’s for Individual Predictors 








Depot A 
(Nu=Ni=20) 


Two-Depot Average 
(r-to-z combination) 


Depot B 
(Nu=Ni=20) 





Panel Interview 
Form I 
Form II 
Form I + II 


Individual Interview 
Form I 
Form IT 
Form I + II 


Group Discussion Problem 
Form I 
Form II 
Form I + II 


Role-Playing Situation 
Form I 


Form IT 
Form I + II 


Small Job Management 


17 
.23 
21 


.23 
.26 
27 


09 
39 
.27 


09 
.20 
17 


09 





criterion groups; and a biserial correlation 
coefficient, based on the estimates of the pre- 
dictor variances obtained from the Peters and 
Van Voorhis procedure.* The analysis of 
mean differences constitutes a test of the null 
hypothesis that the predictors fail to dis- 
criminate between the extreme groups while 
the validity coefficients provide an estimate 
of the relationship between the predictors and 
the criterion over the total sample. 

* As an indication of the extent to which the bi- 
serial coefficients from widespread classes approxi- 
mated the coefficients that would have been obtained 
from total group data, both kinds of coefficients 
were computed for the tests of Basic Ability and 


Supervisory Practices. The results showed close 
agreement for both tests at both depots. 





These results are summarized in Tables 3 
and 4, respectively. In reviewing these data, 
it should be remembered that: The level of 
prediction indicated by these analyses repre- 
sents only that portion of an instrument’s to- 
tal predictive power that is independent of the 
Basic Ability and Supervisory Practices tests, 
and of the other matching variables; and that 
the range of talent in the present sample is 
restricted by the use of experienced super- 
visors, so that the obtained results are lower 
than those likely to be found for a group of 
typical candidates. 

The results reported in Tables 3 and 4 in- 
dicate the following: The Group Discussion 


Table 5 





Validities of the Basic Abilities and Supervisory Practices Tests 








Depot A 


Depot B 
(N = 109) 


(N = 118) 


Two-Depot Average 
(r-to-z combination) 





Criterion vs. Basic Abilities 

Criterion vs. Supervisory Practices 

Criterion vs. (Basic Abilities + 
Supervisory Practices) 


.28 
.26 


.23 
19 


.25 
.23 


29 25 27 
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Table 6 


Multiple r’s for Present Battery Plus Composites of Experimental Predictors* 








(BA + SP) + GD 

(BA + SP) + PI 

(BA + SP) + IT 

(BA + SP) + RP 

(BA + SP) + GD + RP 

(BA + SP) + GD + II 

(BA + SP) + GD + II + PI 

(BA + SP) + GD + II + RP 

(BA + SP) + GD + IT + PI+ RP 


Two-Depot Average 


Depot A Depot B (r-to-z combination) 


38 .26 32 
35 25 30 
38 25 32 
32 27 30 
38 27 33 
42 25 34 
Al .25 33 
40 .26 33 
40 .26 33 


*® BA = Basic Abilities; SP = Supervisory Practices; GD = Group Discussion Problem; PI = Panel Interview; II = Indi 


vidual Interview; RP = Role-Playing Situation. 


Problem shows up as perhaps the most prom- 
ising of the instruments; contrary to expec- 
tations, the Panel Interview shows no su- 
periority over the Individual Interview; the 
interview procedures and the Role-Playing 
Situation are about equally successful; the 
Small Job Management Problem shows little 
promise for predictive effectiveness as it was 
administered in this study. 


Predictor Composites and Operational 
Batteries 


The predictive values of several combina- 
tions of the test instruments are presented in 
terms of the total validity of the experimental 
predictors in combination with the Basic 
Ability and Supervisory Practices tests of the 
existing battery; a combination of this type 
would be employed in actual practice. Table 5 
presents the validities of the existing Basic 
Ability and Supervisory Practices tests, singly 
and in combination (equally weighted). In 
Table 6 these tests are combined with differ- 
ent composites of the predictors studied. In 
all these composites test components received 
équal weighting. Since the criterion groups 
were matched on Basic Ability and Super- 
visory Practices scores, the correlation of 
these tests with the experimental predictors 
in this case is essentially zero, and multiple 
correlation coefficients were computed on this 
basis. 

In general, these findings of the study in- 
dicate that some contribution to the predic- 
tive value of the paper and pencil tests can 


be made by the addition of the predictors 
studied. The results are not differentiating 
enough between the experimental tests to sug- 
gest certain of the predictors to the exclusion 
of others. From a practical point of view, 
however, the similarity of the results ob- 
tained with the panel and individual inter- 
view procedures suggests that the more eco- 
nomical individual interview might be used 
where an interview is specifically desired. 
With respect to further development of these 
tests, the fact that the Role-Playing Situa- 
tion gave validities comparable to other pre- 
dictors despite its low reliability suggests that 
revision of the administrative and scoring 
procedures of this test may be fruitful. Fi- 
nally, it seems important in terms of testing 
efficiency that the Group Discussion Problem 
permitted comparable evaluation of four can- 
didates in the same amount of time in which 
one candidate could be evaluated in an inter- 
view. 

Received April 22, 1957. 
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A Further Note on the Fakability of the MTAI* 


A. G. Sorenson and M. S. Sheldon 
School of Education, University of California, Los Angeles 


The problem of getting honest, objective, 
and straightforward answers to personality 
and interest inventories has been of concern 
to test users for some time. A related prob- 
lem and the concern of this paper, is how to 
find out whether or not a particular inven- 
tory, in this case the Minnesota Teacher At- 
titude Inventory, can be falsified. There are 
at least five published studies (1, 2, 4, 5, 6) 
of the fakability of the MTAI. Each of the 
investigators had his subjects complete the 
inventory twice under differing conditions. 
However each used somewhat different in- 
structions and report somewhat different find- 
ings. Since it appeared that some of the dis- 
crepancies in findings might be due to the 
fact that each investigator was concerned 
with the effect of different conditions of ad- 
ministration, it was decided to conduct a 
study which incorporated all these conditions 
in a factorial design. The report which fol- 
lows will present the findings of that study 
and compare them with those of earlier stud- 
ies. It will also discuss some of the implica- 
tions of the findings for fakability studies in 
general. 


Review of Three Representative Studies 


Callis (1), one of the authors of the MTAIT, 
worked with three groups of college students. 
The students in one group first completed the 
inventory under standard directions. Four to 
six weeks later they repeated the inventory 
under directions to “fake good,” i.e., to make 
as high a score as possible by answering the 
items the way they thought a good teacher 
would. In a second group the students were 
asked to “fake good” on the first administra- 
tion of the inventory. A week to 10 days 
later they repeated the inventory under stend- 
ard directions. A third group, the control, 
was also tested twice, a week to 10 days 

1 This study was supported in part by the Fund 


for Occupational Research of the School of Educa- 
tion, UCLA. 


apart, and received standard directions both 
times. Callis’ results showed that the group 
which “faked good” on the second adminis- 
tration raised its mean score 9.6 points, a dif- 
ference significant at the .01 level of confi- 
dence. The mean faked score for the group 
that was told first to fake and then to repeat 
under standard directions, was a statistically 
insignificant 1.8 points higher than its mean 
score under standard directions. (This would 
indicate that “Order” as a variable, i.e., 
whether the subjects received standard direc- 
tions prior to faking directions or vice versa, 
should be considered in a fakability study.) 
The group which worked under standard di- 
rections both times added 4.2 points to its 
mean upon the second completion of the in- 
ventory. This gain was significant at the .02 
level. Callis concluded that the MTAI “may 
be susceptible to faking to a limited extent.” 

Rabinowitz (4) also tested three groups 
twice. A control group responded both times 
under standard directions. A second group 
responded first under standard directions and 
second after being instructed to fake in a 
permissive direction, i.e., as if applying for a 
job at a school where the principal thought 
good teachers were characterized by “mutual 
affection and sympathetic understanding.” A 
third group responded first under standard 
directions and second after being instructed 
to fake in an authoritarian direction, i.e., as 
if applying for a job at a school where the 
principal thought good teachers were charac- 
terized as maintaining “‘relations in which the 
pupils respect the authority of the teacher, 
and the teacher accepts that authority as a 
trust.” All the subjects were tested at one 
session. None was aware that a second test- 
ing would occur, nor did any have access to 
their first answers. An analysis of variance 
was performed on the results of both the first 
and second administrations. The first ad- 
ministration of the test, where standard direc- 
tions were used, did not produce significant 
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differences among the three groups. On the 
second administration, the groups receiving 
the two sets of faking instructions responded 
differently from each other and from the 
group which proceeded under standard in- 
_ structions. The differences were significant 

at the .01 level. (This would indicate that 
instructions to fake in a particular direction 
produce different results. Apparently instruc- 
tions as to direction may provide a “respond- 
ing set,” i.e., a cue as to how the inventory 
can be faked, and “direction” instructions 
become a relevant variable in fakability 
studies.) Rabinowitz further found that the 
group which completed the inventory twice 
under standardized directions did not change 
its mean score significantly. Both groups who 
faked produced means which differed signifi- 
cantly from their original mean scores. Ra- 
binowitz concluded on the basis of these find- 
ings that the MTAI “may have limited value 
‘for selection purposes.” 

Sorenson (5) had two groups of elementary 
teacher candidates and two groups of second- 
ary teacher candidates complete the inven- 
tory, first under standard directions and a 
second time as if applying for a job “in a 
school system which is known to be progres- 
sive.” Half the subjects in each group were 
told to sign their names to the answer sheets 
and half responded anonymously. All groups 
raised their means scores significantly under 
faking instructions. There was a statistically 
significant difference between the means of 
those subjects who signed and those who did 
not sign the answer sheets under standard di- 
rections. There was a significant difference 
in the mean “gains” score of the signed and 
unsigned groups. The “gains” score was the 
difference between an individual’s scores on 
the two administrations. (Thus it is indi- 
cated that whether or not subjects sign the 
answer sheets is a relevant variable in fak- 
ability studies.) Sorenson concluded that the 
MTAI probably can be faked by prospective 
teachers and that a subject’s “beliefs regard- 
ing the use to which the scores will be put 
and his understanding of the directions may 
influence his responses to an inventory.” 

According to these three studies then, three 
variables which should be controlled in an in- 


vestigation of the fakability of the MTAI are 
(a) Order, whether the subjects respond to 
standard directions first or to faking direc- 
tions first; (6) Responding Set, whether or 
not the subjects are instructed to fake in a 
particular direction, since such directions may 
provide a cue as to the correct answers; (c) 
Signing, whether or not the subjects’ identi- 
ties are to be known. 


Procedures Employed in the Factorial Study 


The sample consisted of 156 students in the School 
of Education at UCLA—all candidates for the sec- 
ondary teaching credential. The subjects were ran- 
domly assigned to 12 groups. The 12 groups were 
then divided into three sets of 4 groups each. Then 
each set was divided into two pairs. All the sub- 
jects completed the MTAI twice, once having re- 
ceived only the directions printed on the cover of 
the inventory, and once having been instructed to 
fake the inventory in one of the following three 
ways. The first set of four groups was instructed 
to “assume that you are applying for a teaching po- 
sition in a school rumored to be progressive. Re- 
spond to the items of this inventory in a way that 
you think will be most likely to get you the job.” 
The second set of four groups received the same 
faking directions except that the word “traditional” 
was substituted for “progressive.” The remaining 
set of four groups was instructed to “assume that 
you are applying for a teaching position. Respond 
to the items of this inventory in a way that you 
think will be most likely to get you the job.” No 
indication as to the nature of the school was given. 

Half of the subjects, one pair of groups from each 
of the sets of four, were given the faking instruc- 
tions first and then asked to complete the inventory 
under standard directions; while the other half of 
the subjects, ie., the alternate pairs of groups, com- 
pleted the inventory under standard directions first 
and then were asked to fake. In one group from 
each pair the students were instructed not to identify 
themselves while in the second group of each pair 
they were told to print their names on the answer 
sheets. 

Both administrations of the inventory took place 
during a single session. The students received writ- 
ten instructions. There was one proctor for each of 
the 12 groups. As soon as a subject finished the in- 
ventory, the proctor picked up his answer sheet and 
the instruction sheet, and handed him a second sheet 
of instructions and a second answer sheet numbered 
the same as his first. 

The answer sheets were scored and rescored by 
machine using an elimination key. A constant of 
100 was added to each score. “Change” scores were 
computed for each subject, and a constant of 200 
was added to each. Except for the groups which 
had faked in the traditional direction, the change 
scores were computed by subtracting the scores ob- 
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tained under standard directions from the faked 
scores, to determine how much the students raised 
their scores when they faked. In the case of the 
four groups who faked in the traditional direction 
the process was reversed to determine how much 
they had lowered their scores when they faked. 

The statistical treatment of the data included four 
three-way analyses of variance. The first analysis 
of variance was of the scores obtained under stand- 
ard directions, the second was of faked scores, and 
the third was of change scores. In each of these 
three analyses of variance the main effects were or- 
der, responding set or direction, and signing. Thus 
each analysis employed a 2 X 2 X 3 factorial design. 

A fourth analysis of variance was performed on 
the scores of the. four groups told to fake but who 
were not given a responding set, i.e., who were given 
no suggestion as to direction. The main effects in 
this analysis were order, signing, and change (a com- 
parison of the scores under standard instructions 
with those under faked instructions). 


Results 


The results of the four analyses of variance 
appear in Tables 1 through 4. Table 1 pre- 
sents the results of the analysis of variance of 
the scores achieved under standard instruc- 
tions. None of the three main effects nor 
any of the interactions resulted in F ratios 
which are statistically significant. Although 
the means for the 12 groups varied from a 


low of 31 to a high of 60, the over-all test of - 
significance indicates that these means could 
have been drawn from the same population 
by chance. 

Table 2 shows the analysis of variance of 


Table 1 


Analysis of Variance of MTAI Scores Obtained 
Under Standard Directions 








Source of Variation df MS 





Order 

Responding set 

Identification 

Order X Responding set 

Order X Identification 

Identification X Respond- 
ing set 

Order X Identification X 
Responding set 

Within groups 


1,628.30 
2,522.33 
1,046.26 
147.26 
77.57 


1,595.51 


92.20 
974.74 


Total 





*** Less than unity. 


Table 2 


Analysis of Variance of MTAI Scores Obtained 
Under Faking Directions 








Source of Variation df MS 





Order 
Responding set 
Identification 
Order X Responding set 
Order X Identification 
Identification X Respond- 
ing set 
Identification X Respond- 
ing set X order 2 
Within groups 144 


18,156.98 
129,921.33 
1.08 
13,183.70 
53.09 


2,432.34 


864.15 
2,219.31 


155 


* Significant at the .05 level of confidence. 
** Significant at the .01 level of confidence. 
** Less than unity. 


the scores achieved under faking conditions. 
The F ratios for the main effects of order and 
responding set are both statistically signifi- 
cant at the .01 level of confidence.* The in- 
teraction between these variables is significant 
at the .05 level. 

Table 3 shows the results of the analysis of 
variance of the change scores. It will be seen 
that the F ratios for order and responding set 
and their interactions again are statistically 
significant. 

Table 4 shows the results of the analysis of 
variance performed on the scores of the four 
groups not given a responding set. The only 
effect or interaction of statistical significance 
is identification, or whether or not the sub- 
jects signed the answer sheet. 


Discussion 


In light of the factorial study and the 
studies reviewed above, it would appear that 


2 The Bartlett test for homogeneity was made be- 
fore carrying out any of the analyses. The variances 
for the groups under standard directions and for the 
groups not given a responding set were found to be 
homogeneous. The variances for the groups under 
faking directions and for the change scores were 
found not to be homogeneous. However Cochran’s 
test indicated that the variances were not so dispa- 
rate as to affect the analyses. This, together with 
the Norton study (3), encouraged the present au- 
thors to utilize the analysis of variance technique 
even though the assumption of homogeneity was not 
fulfilled. 








Fakability of the MTAI 


the following statements can be made con- 
cerning the fakability of the MTAI as it has 
been studied thus far: 

1. If the subjects have a responding set 
their chances of faking successfully will ap- 
parently be much greater than if they do not 
receive such a cue. The difference between 
the findings of Callis on the one hand and 
Rabinowitz and Sorenson on the other might 
be accounted for by the fact that both Ra- 
binowitz and Sorenson introduced a respond- 
ing set, i.e., instructions to fake in a particu- 
lar direction, whereas Callis did not. 

2. When subjects complete the inventory 
under directions to fake before they respond 
to standard instructions, the change scores 
are likely to be smaller than if the reverse 
order is used. 

3. The effect of identification, or signing, is 
still undetermined, since the findings are not 
consistent. 

4. The question of the effect of practice 
on the MTAI is one about which the pres- 
ent study provides only little evidence, but 
probably deserves comment. It will be re- 
called that when Callis’ control group re- 
peated the inventory under standard instruc- 
tions it showed an increase in mean score 
which was statistically significant. Rabino- 
witz also had a control group repeat the in- 
ventory under similar conditions but it did 
not show a significant increase in mean score. 


Table 3 
Analysis of Variance of MTAI “Change” Scores 





Source of Variation df MS F 


Order 1 
Responding set 2 
Identification 1 
2 
1 


4.75* 
35.46** 
1.63 

4.92** 


10,700.41 
79,820.85 
3,663.69 
11,080.72 
2,464.11 


Order X Responding set 
Order X Identification 
Identification X Respond- 
ing set 
Order X Identification X 
Responding set 2 
Within groups 144 


1,307.99 


539.25 
2,251.12 


Total 155 





* Significant at the .05 level of confidence. 
** Significant at the .01 level of confidence, 
*** Less than unity. 


Table 4 


Analysis of Variance of MTAI Scores Obtained Under 
Standard Instructions and Under Faking 
Instructions When No Responding 

Set Was Given 





Source of Variation df MS 


Order 1 
Change 1 
Identification 1 
1 
1 
1 


2,233.88 
162.49 
4,472.34 
1,137.86 
74.47 
199.40 


Order X Change 
Order X Identification 
Identification K Change 
Identification X Change X 

Order 1 
Within groups 96 


13.86 
779.06 


Total 103 


** Significant at the .01 level of confidence 
on than unity. 


In the present study the four groups which 
completed the inventory under standard di- 
rections and then repeated under instructions 
to fake, but with no responding set being 
given, did not show a significant increase. 
The present study is similar to that of Ra- 
binowitz in that the subjects repeated the in- 
ventory during the same session, whereas 
Callis’ students repeated after a delay of a 
week or more. A possibility to be considered 
is that through discussion or other means the 
subjects acquired cues which influenced their 
second performance on the inventory. 

As was suggested earlier, whether or not a 
student will attempt to fake may depend 
upon the use he expects to be made of the 
scores. It is more likely that he will attempt 
to fake in a selection situation than in a 
counseling situation. In none of the studies 
reported here were the subjects completing the 
inventory under selection conditions. They 
were only asked to pretend that they were 
performing under selection conditions.® 

8 The authors administered the MTAI to a group 
of students as a part of the routine selection process 
in the School of Education at UCLA. It is assumed 
that these samples are from the same population 
as that reported earlier by Sorenson (5). In that 
study two groups of prospective elementary teachers 
achieved means, under standard directions, of 51 and 
45, and SD’s of 22 and 28, respectively. Each 


of two groups of prospective secondary teachers 
achieved a mean of 41 and SD’s of 30. In the 
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While it would appear that groups of stu- 
dents are not able to fake the MTAI, unless 
given a cue, there are at least two other ques- 
tions relative to the faking problem that de- 
serve attention. In the studies at UCLA 
there have been individuals who made sta- 
tistically significant increases in scores, even 
without a cue from the directions. How do 
such students differ from those who did not 
change their scores, or who changed in the 
wrong direction? Second, what would be the 
predictive validity of the MTAI if it were 
administered under selection conditions? 


Summary 


This study employed a factorial design to 
investigate the effects of several conditions of 
administration on the fakability of the MTATI. 
The findings indicate that in the kind of fak- 
ability studies which have been conducted 
with the MTAI, whether the subjects fake the 
test first and then respond under standard in- 
structions, or vice versa, and whether the in- 
structions give a cue as to the nature of the 
present study the prospective elementary teachers, 
N 79, achieved a mean of 46 with an SD of 29. The 
prospective secondary teachers achieved a mean of 
37 and SD of 29. Obviously these data do not sup- 
port the hypothesis that students completing the 


MTAI under selection conditions, will as a group 
show higher scores. 


inventory, will influence the results. These 
findings are discussed in relation to several 
previous studies. In general the findings sup- 
port the conclusion that groups of students 
are not likely to be able to fake the MTAI 
unless they receive a cue from the faking in- 
structions, or elsewhere, as to what the inven- 
tory is about. 
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Symbols generated from multi-element 
“printing” matrices are frequently used to 
represent more conventional Arabic numerals. 
An example is the automatic score board used 
in large sporting stadiums. Recently, the fea- 
sibility of several visual coding schemes for 
displaying various types of numerical infor- 
mation in air traffic situations has been 
studied. Typical potential uses would in- 
clude an airborne “printer” for transmitting 
information to an aircraft crew (5) and elec- 
tronic “printers” for indicating the identity 
of a target blip on a cathode ray tube dis- 
play (1, 8, 12). 

Cohen and Webb (5) studied the use of 
symbolic Arabic numerals that were gener- 
ated from a six-element straight-line matrix. 
They found performance with conventional 
numerals superior to performance with these 
symbolic numerals. For further study, they 
suggested the use of an eight-element matrix 
because it appeared to provide an improved 
series of symbolic numerals as well as a fairly 
readable series of symbols representing the 
26 letters of the English alphabet (5, pp. 8- 
9, 14). 

The symbolic Arabic numerals used in this 
study are based on those suggested by Cohen 
and Webb, and on the results of another 
study in which optimal symbols were selected 
for several of the numerals (12, pp. 84-89). 
These numerals are shown in Fig. 1 along 
with the basic eight-element straight-line ma- 
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trix from which they were drawn. For trans- 
mitting numerical information, performance 
with these numerals has been found to be not 
much different from performance with con- 
ventional numerals (1). This appears to be 
a reasonable finding in view of other evi- 
dence apparently favoring performance with 
straight-line and angular numerals over per- 
formance with more conventional figures.® 
The present study was designed to test this 
finding. 

Specifically, the present study was de- 
signed to compare the information-handling 
performance of Ss responding to two sets of 
Arabic numerals—one a set of conventional 
figures, the other an optimized set of straight- 
line symbolic figures. In addition, verbal as 
well as motor (key-pressing) responses were 
used because previous findings (1) had indi- 
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Fic. 1. The eight-element straight-line matrix from 
which were drawn the symbolic Arabic numerals 
used in this study. The symbolic numerals are 
identified in the figure by the smaller conventional 
Arabic numerals below the symbols. 

8 For example, in reviewing the results of some 13 
studies on legibility, Tinker (15) concluded that 
maximum legibility was obtained with Roman capi- 
tals—figures made up almost entirely of straight 
lines and sharp angles. Berger (3), in designing 
numerals to give optimal visibility for white letter- 
ing on black, paterted a series of numerals also con- 
sisting almost entirely of straight lines. More re- 
cently, Lansdell (7) compared two standard sets of 
numerals (the Mound and the Mackworth) with a 
new set of angularly formed numerals, and found 
performance under difficult viewing conditions to be 
better with the new set than with the standard. 
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cated the possibility of an interaction be- 
tween the two types of numerals and these 
two modes of response. Finally, practice ef- 
fects were studied for both types of numerals 
and for both modes of response. 


Method 


The experiment was conducted in two parts. In 
Part M, 24 Ss made motor (key-pressing) responses 
to the different stimuli over a period of two days, 
and five Ss continued responding over 10 additional 
days. In Part V, verbal responses were made by a 
different group of Ss under otherwise identical con- 
ditions. 

Apparatus. A Serial Discrimeter, designed and 
constructed in the Laboratory of Aviation Psychol- 
ogy, The Ohio State University, was the apparatus 
used. It has been described elsewhere (1, 10, 11) 
and is similar in function to an instrument described 
by Morin and Grant (9). Essentially, it consists of 
five basic components: programming unit, switch- 
ing unit, display unit, response unit, and scoring 
unit. 

The programming unit consists of a board con- 
taining 100 rows of ten single-pole double-throw 
toggle switches each. One of the ten switches in 
each row is set by E; this fixes the sequence of 
stimuli to be activated on each of 100 serial stimu- 
lus presentations. 

The switching unit interrogates the rows of the 
programming unit in serial order and transmits sig- 
nals to the display unit. Under self-pacing condi- 
tions, the switch advances from one row to the next 
whenever a response is made by S. 

Many different display units and response units 
can be used with the apparatus. The display used 
in the present study consisted of a 10-in. diameter 
opal glass screen. The various symbols were pro- 
jected onto the screen from the back by means of a 
ten-unit optical projector, each unit of which con- 
tained a different photographic transparency. 

The response unit used in Part M consisted of a 
bank of ten finger keys placed directly in front of 
S on a table. The keys were arranged horizontally 
in two semicircles to correspond with the natural 
placement of the ten finger tips. The finger keys 
were numbered, from left to right, 1, 2, 3, ..., 9, 0 
(0 represented 10), and S was told to respond by 
pressing the key corresponding to the specific sym- 
bol presented on the screen. The response unit used 
in Part V consisted of a boom microphone connected 
to a square-wave impulse amplifier, and S responded 
by speaking into the microphone (“one,” “two,” 
etc.). During both parts, broad-band noise at ap- 
proximately 70 db. was presented to S through ear- 
phones in order to mask extraneous sounds. 

The scoring unit consists of two elements: a tim- 
ing element and an automatic recording element. 
The timing element used here was a Standard Elec- 
tric Timer on which was recorded the total time for 


a series of 100 stimulus presentations. The auto- 
matic recording element consists of a two-dimen- 
sional matrix containing 110 three-place electrome- 
chagical counters arranged in a 10 X 11 stimulus- 
response matrix. 

Although the timing element was used during both 
parts of the experiment, the automatic recording ele- 
ment was used only with the key-pressing responses 
of Part M. It was not used in Part V because the 
apparatus could not discriminate among the differ- 
ent verbal responses. Error patterns in Part V were 
recorded by a monitor using lists of the programmed 
stimuli; audiotape recordings were also taken for 
use in resolving any ambiguity in this scoring. 

Stimuli. Two sets of Arabic numerals, one sym- 
bolic and one conventional, were used in this ex- 
periment. The symbolic numerals were straight-line 
figures generated from an eight-element matrix; 
these were illustrated in Fig. 1. The conventional 
numerals were the AND-10400 numerals recom- 
mended by Baker and Grether (2) for use in in- 
strument identification. The symbols from each of 
the two sets appeared in the experiment as }-in. 
high light patterns at the center of the display, ap- 
proximately 28 in. in front of the seated S. 

Subjects. The Ss were 48 men obtained from a 
pool of students who had volunteered to serve as 
part-time Ss. No S had prior experience with the 
apparatus or with the symbolic numerals. They 
were randomly divided into two groups of 24 Ss 
each; each group served in only one of the two 
parts (M or V) of the experiment. 

Procedure. Each S responded for one session of 
five trials on each of two successive days. In addi- 
tion, five volunteer Ss from each group continued 
to respond for a five-trial session on each of the 
ten succeeding working days for a total of 12 ses 
sions, or 60 trials in all. 

Each trial consisted of 200 stimulus presentations 
—100 presentations of stimuli from each of the two 
sets of numerals. Each series of 100 stimuli con- 
sisted of ten of each of the symbols in either the 
set of conventional or the set of symbolic numerals. 
The order of symbols in each series was random and 
different for each series. 

The 12 odd-numbered Ss in each group responded 
to the conventional numerals during the first half 
of each trial in the first session, and during the 
second half of each trial in the second session; dur- 
ing the remainder of their trials, they responded to 
the symbolic numerals. The even-numbered Ss re- 
sponded to the two sets of numerals in a comple- 
mentary order. 

A short rest period of about 2-min. duration was 
given between the two halves of each trial and also 
between the trials in each session. Before each half 
of his first trial, each S was shown the ten symbols 
comprising the set with which he was to work. No 
further familiarization with the numerals was given. 

In Part M, S was instructed to respond by press- 
ing a corresponding finger key whenever a symbol 
appeared. In Part V, S was instructed to respond 





Symbolic and Conventional Arabic Numerals 


verbally by calling out the numeral displayed. In 
both parts of the experiment, S was instructed to 
respond as rapidly as he could, but to maintain as 
far as possible an error-level below 5 per cent of 
the responses in each trial. 


Results 


The data obtained were summarized in the 
form of stimulus-response matrices—one ma- 
trix for each 100 responses made by one S to 
one set of numerals during one of the trials. 
From each such matrix, the amount of infor- 
mation transmitted (in bits/stimulus) was 
computed with procedures that have been de- 
scribed elsewhere by Shannon and Weaver 
(14) and others (6, 13). 

Each of these scores was then divided by 
the total time S had taken in making his 100 
responses, and the amount of information 
transmitted by the average S (in bits/sec.) 
was computed as the arithmetic mean of the 
scores for individual Ss. The results obtained 
in the two parts of the experiment are shown 
in Fig. 2 for the two sessions during which 
24 Ss participated with each mode of re- 
sponse. 

The statistical significance of the differ- 
ences here (and throughout the experiment) 
was tested by use of a nonparametric method 
(4). Sign tests made at each of the succes- 
sive trials indicated the following: (a) With 
verbal responses, the amount of information 
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Fic. 2. Information transmitted by the average S 
in making verbal and motor responses to conven- 
tional and symbolic Arabic numerals as a function 
of practice. Each data point is based upon the re- 
sponses of each of 24 Ss to 100 stimulus presenta- 
tions (or 2,400 responses per point). Different 
groups of Ss were used in making the verbal and 
motor responses. 


Table 1 


Mean Time and Mean Number of Errors Per 100 Verba! 
and Motor Responses to Conventional and 
to Symbolic Arabic Numerals 
During Two Sessions 


Mean Number of 
Errors per 
100 Responses 


Mean Time per 
100 Responses 
(sec.) 


Session Session 
I Il 


Session Session 
I II 


Verbal Responses 
Conventional 
Symbolic 
P of difference 


Motor Responses 


Conventional 
Symbolic 
P of difference 


Note.—Motor responses were made by 24 Ss in Part M, and 
verbal responses were made in Part V by a different group of 
24 Ss. During each session, each S responded to five trials of 
100 stimulus presentations from each of the two sets of numerals. 
Thus, each of the eight time (or error) means reported is based 
upon § X 24 = 120 trials of 100 stimulus presentations per 
trial, or oo responses in all. 


transmitted with the conventional numerals 
was significantly greater than that trans- 
mitted with the symbolic on each of the ten 
trials (P < .001 in each case). (6) When 
motor responses were made, however, only 
the differences obtained in the first and sec- 
ond trials were statistically significant (P < 
01 and P < .05, respectively). 

Separate summaries of the time and error 
aspects of performance are presented in 
Table 1. These data indicate that, when 
verbal responses were made, the conventional 
numerals were used with greater speed and 
accuracy than were the symbolic numerals; 
this was true for both sessions. With motor 
responses, however, only during the first ses- 
sion were the conventional numerals used 
with greater speed than the symbolic, and 
during the second session the symbolic nu- 
merals were apparently used with greater ac- 
curacy than were the conventional; during 
both sessions, however, the differences in ac- 
curacy were relatively small with the motor 
responses. 

Effects of longer-term practice. Ten Ss 
(five each in Parts M and V) continued re- 
sponding for a total of 12 sessions in order 
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Fic. 3. Information transmitted by the average S 
in making verbal and motor responses to conven- 
tional and symbolic Arabic numerals as a function 
of long-term practice. Each data point is based 
upon the responses of each of five Ss to five trials 
of 100 stimulus presentations (or 2,500 responses 
per point). Different groups of Ss were used in 
making the verbal and motor responses. 


that data might be obtained under conditions 
of longer-term practice. The information- 
handling performances of these Ss are shown 
in Fig. 3. 

Tests of statistical significance applied to 
these data indicated the following: (a) When 
verbal responses were made, the amount of 
information transmitted with the conven- 
tional numerals was significantly (P < .05) 
greater than that transmitted with the sym- 
bolic on all but two of the 12 sessions (Ses- 
sions 6 and 10); in both sessions, however, 
the direction of the difference also favored 
the conventional numerals. (4) When motor 
responses were made, only the differences ob- 
tained in two sessions (Sessions 5 and 12) 
were statistically significant (P < .05), and 
whereas the direction of one favored one type 
of numeral, the direction of the other favored 
the other type of numeral. 

It was found in averaging the verbal-re- 
sponse data of the first and the last six ses- 
sions, that all five Ss transmitted more infor- 
mation with the conventional numerals than 
with the symbolic. Only three of the five 
Ss making motor responses transmitted more 
information with the conventional numerals 
during the first six sessions, however, and 
during the last six sessions four transmitted 
more information with the conventional than 
with the symbolic numerals. 

The trend with continued practice appears 


Earl A. Alluisi and Hugh B. Martin 


to be, then, that (a) the apparent superiority 
of the conventional over the symbolic Arabic 
numerals is retained when verbal responses 
are made, and (b) even though there is no 
difference between the numerals in earlier 
motor-response performance, the same sort 
of superiority may become apparent after 
fairly long periods of practice. 

Separate summaries of the time and error 
scores are presented in Table 2 for these data 
based on longer-term practice. According to 
these data, the conventional numerals were 
used with greater speed and greater accuracy 
than the symbolic when verbal responses were 
made; this was true for both the first and the 
last six sessions. Although they were also 
used with greater speed during both sets of 
sessions with the motor responses, the con- 
ventional numerals were used with less ac- 
curacy during the first set of six sessions; 
the difference was in the same direction, but 
not statistically significant, during the second 
set of six sessions. 


Discussion 


On an a priori basis, it seemed reasonable 
to suppose that information-handling perform- 
ance with the symbolic numerals might have 


Table 2 


Mean Time and Mean Number of Errors Per 100 Verbal 
and Motor Responses to Conventional and 
to Symbolic Arabic Numerals 
During 12 Sessions 


Mean Number of 
Errors per 
100 Responses 





Mean Time per 
100 Responses 
(sec.) 


Sessions ‘Seasions 
1-6 7-12 


Sessions 
1-6 


Sessions 
7-12 

Verbal Responses 
Conventional 63.2 


Symbolic 67.1 
P of difference * 


0.307 
0.693 
* 


Motor Responses 

Conventional 81.9 70.7 4.420 
Symbolic 82.9 72.0 3.587 
P of difference * ° * 


4.547 
4.500 





Note.— Motor responses were made by five Ss in Part M, 
and verbal responses were made in Part V by a different group 
of five Ss. During each session, each S responded to five trials 
of 100 stimulus presentations from each of the two sets of 
numerals, Thus, each of the eight time (or error) means re- 
ported is based upon 5 X 5 X 6 = 150 trials of 100 stimulus 
SS per trial, or upon 15,000 responses in all. 

</ 
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at least equaled that with the conventional 
numerals. This seemed especially likely in 
view of (a) the improved likenesses obtained 
from the eight-element over the six-element 
matrix (5), (5) the selection of optimal sym- 
bols from the eight-element matrix (12), and 
(c) the evidence apparently favoring straight- 
line and angular figures over more conven- 
tional curved-line figures (3, 7, 15). The 
data of the present experiment did not cor- 
roborate this supposition. 

Instead, an interaction between the two 
types of numerals and the two modes of re- 
sponse, first noted in a previous study (1), 
was again evidenced here. In terms of in- 
formation handling (bits/sec.), time, and er- 
rors, performance with the conventional nu- 
merals was consistently superior to perform- 
ance with the symbolic numerals when verbal 
responses were made. No such clear superi- 
ority for either set of numerals was evidenced 
in the motor-response performances. 

During the very earliest stages of practice 
with the motor responses, the conventional 
numerals appeared to be superior to the sym- 
bolic in terms of information-handling per- 
formances. This difference did not appear 
during later stages of practice, but there was 
a suggestion that the initial superiority might 
again be found after considerable practice. 

The initial superiority of conventional nu- 
merals with motor responses appeared to be 
a function of a superiority in speed, rather 
than a function of a difference in accuracy. 
During the later stages of practice with the 
motor responses, there appeared to be only 
small differences between the two sets of nu- 
merals, and these differences continued in the 
direction of greater speed, but less accuracy 
in performance with the conventional numer- 
als than with the symbolic. 

One might speculate as to the cause of this 
interaction and the failure to corroborate the 
initial supposition of no difference in perform- 
ance with the two sets of numerals. Two hy- 
potheses are suggested. 

First, an argument might be offered in 
terms of stimulus-response compatibility ef- 
fects. Because number-naming responses to 
conventional Arabic numerals are greatly 
overlearned in our culture, it might be argued 


that they form a highly compatible stimulus- 
response ensemble—so compatible, in fact, 
that any perceptible change in the figures (as 
in forming the symbolic numerals) results in 
a less-compatible ensemble (as measured by 
lower performance). 

On the other hand, the key-pressing re- 
sponses are less practiced relative to the ver- 
bal responses and form, therefore, a less-com- 
patible ensemble with either set of numerals 
(i.e., performance with motor responses lower 
than with verbal). Also, the differences be- 
tween the less-compatible ensembles formed 
with motor responses should be less affected 
by changes in stimulus figures (i.e., the dif- 
ference between performances with the two 
sets of numerals should be smaller with mo- 
tor responses than with verbal). This argu- 
ment is consistent with the data. 

A sécond line of argument would suggest 
that performance with straight-line and an- 
gular figures is superior to performance with 
conventional numerals only under difficult or 
threshold-like viewing situations in which 
probability of correct identification, and not 
response time, is measured. This might be 
taken as an explanation of the differences be- 
tween the findings with verbal responses in 
the present study and the earlier findings of 
Tinker (15), Berger (3), and Lansdell (7). 
It would not explain the interaction between 
the types of numerals and responses found in 
the present and a previous (1) study. 

Both hypotheses appear reasonable; both 
are consistent with the data. Additional data 
are needed, however, before either can be 
said to be valid. 


Summary and Conclusions 


This experiment was designed to compare 
the information-handling performance of Ss 
in making verbal and motor responses to two 
sets of Arabic numerals—one a set of con- 
ventional figures, the other a set of symbolic 
figures drawn from an eight-element straight- 


line matrix. The motor (key-pressing) re- 
sponses to the different stimuli were made by 
a group of 24 Ss over a period of two days, 
and by five Ss over a longer period of 12 
days. An identical number of different Ss 
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made verbal (number-naming) responses for 
the same length periods. 

When verbal responses were made, the con- 
ventional numerals were consistently superior 
in performance to the symbolic numerals. 
This was true whether performance was meas- 
ured in terms of information handling (in 
bits/sec.), time, or errors. No such clear su- 
periority was evidenced for either set of nu- 
merals when motor responses were made. 

It was suggested that this interaction of 
numeral type with response mode might be a 
stimulus-response compatibility effect result- 
ing from use of the much-practiced ensemble 
of number-naming responses to conventional 
Arabic numerals. It was also hypothesized, 
considering the data of other investigators, 
that performance with straight-line and an- 
gular figures should be superior to perform- 
ance with conventional numerals under diffi- 
cult or threshold-like viewing situations as, 
for example, in visibility studies, but not 
necessarily superior under speeded-response 
conditions with stimuli above threshold. 

With regard to practical applications, the 
numerals formed by the use of an eight-ele- 
ment “printing” matrix do not appear to be 
quite as satisfactory as standard AND-10400 
numerals. They should not be used if other 
considerations are equal, but should their use 
be dictated by expediency the result should 
be only a small drop in information-handling 
performance. 


Received April 5, 1957. 
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Officers in the Comptroller and Personnel 
Fields * 
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The Industrial Relations Center at the Uni- 
versity of Minnesota has undertaken a series 
of studies concerning the measured interests 
(Strong Vocational Interest Blank) of Air 
Force Officers in the Personnel and Comp- 
troller-Accountant Fields. The initial study 
in the series showed that neither officer group 
had measured interest patterns similar to 
those of their civilian counterparts (1). This 
finding suggested the possibility of classifying 
individual profiles within a given military oc- 
cupational area as Like or Unlike the av- 
erage profile of a civilian criterion group in 
the same occupational area. Subsequently, 
it could then be determined if the Like and 
Unlike groups differed on personal history 
and career data information. Of special sig- 
nificance for the present réport are differences 
between Like and Unlike groups on items re- 
lating to satisfactory vocational adjustment. 

The present study (2) is an attempt to an- 
swer the following questions: To what de- 
gree does the Strong Vocational Interest Blank 
reflect satisfactory vocational adjustment for 
officers in the comptroller and personnel fields? 


The Sample 


All officers who were stationed in the 
United States as of 1 August 1952 and kad 
Director of Personnel or Comptroller or Ac- 


1 This study was supported in part by the United 
States Air Force under Contract Number AF 18(600) - 
337, monitored by Human Resources Research Insti- 


tute, Maxwell Air Force Base, Alabama. Permission 
is granted for reproduction, translation, publication, 
use and disposal in whole and in part by or for the 
United States Government. This is not an official 
publication under the contract. Views or opinions 
expressed or implied herein are not to be construed 
as necessarily reflecting the views or indorsement of 
the Department of the Air Force or of the Air Re- 
search and Development Command. 

2 Assistance during various phases of the research 
from Paul G. Jenson, Harry E. Roadman, and Ernest 
L. McCollum is greatfully acknowledged. 


countant-Auditor Staff as their primary Air 
Force Specialty (AFS) designation, first ad- 
ditional AFS, or duty AFS designation were 
sent Strong Vocational Interest Blanks and 
personal history questionnaires. Random sam- 
ples of 600 officers in the AFS’s of Personnel 
Staff Officer and Personnel Officer also re- 
ceived the material. 

Table 1 gives the number of officers in each 
AFS to whom the material was sent, the re- 
turn, the percentage of return, and the num- 
ber of usable returns. The correction factor 
includes those officers whose materials were 
returned because of incorrect addresses, death 
of the officer, etc. The Corrected N Sent in- 
cludes just those officers who presumably re- 
ceived the materials. 

Not all returns could be used. Of the 
1,470 officers who returned the material, in- 
formation from 72 could not be used. This 
was 4.9 per cent of the total returned. The 
most frequent reason for nonusable returns 
was not completing correctly the Strong Vo- 
cational Interest Blank or the Personal Data 
Sheet. The next largest group consisted of 
those officers who reported no AFS and no 
duty assignments in the areas under study. 

The survey returns, then, contained two 
kinds of research data: completed Hankes an- 
swer sheets for the Strong Vocational Inter- 
est Blank and responses to a personal history 
questionnaire. The personal history blank 
requested information about the following: 


1. Age 

2. Marital status 

3. Number of dependents 

4. Highest year completed in school 

5. College degrees 

6. College major 

7. Years of civilian comptroller or person- 
nel experience 
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Table 1 





Summary of Returns 





N 


AFS Sent 


Correction 


No. of 
Usable 
Returns 


Corrected N 
Returned Percentage 


N Sent 





Comptroller 232 7 
Acc’t-Aud. Staff 95 
Director of Pers. 326 13 
Pers. Staff Officer 600 
Pers. Officer 600 


1,853 


Total 


. Date of first entry into military service 

. Total active commissioned service 

. Duty AFSC 

. AF Component (Regular, Reserve, etc.) 

. Military rank 

. Time since last promotion 

. Comptroller or personnel training in 
the military 

. Comptroller or personnel experience in 
the military 

. Choice of AF duty 

. Choice of civilian occupation if released 
from AF duty 


Methods and Procedure 


The SVIB profiles of both the comptroller and 
personnel groups were separated into Like and Un- 
like groups on the basis of profile similarity to their 
civilian counterparts. Strong’s criterion groups of 
accountants and personnel directors provided stand- 
ards for this separation (3). 

By using the mean standard score profile of 
Strong’s criterion group of accountants, upper and 
lower cutting scores were established on six occu- 


Table 2 


Classification of SVIB Profiles According to Similarity 
to an Appropriate Criterion Profile 








Comptroller Group _‘ Personnel Group 





Per- 
centage N 


Per- 


Classification N centage 





Like 78 
Indeterminate 108 
Unlike 57 


32.1 
44.4 
23.5 


464 
339 


40.2 
29.3 
30.5 


Total 243 = 100 100 





225 187 
90 75 83.3 
313 253 80.8 
477 83.5 

85.8 


83.1 


83.7 

pational scales: C.P.A., Senior C.P.A., Accountants, 
Office Man, Purchasing Agent, and Banker. These 
six scales were chosen because they are the scales on 
which Strong’s civilian accountants had their highest 
average scores. 

The following procedure was used to obtain cut- 
ting scores for determining the comptroller Like and 
Unlike groups. The mean standard score of the cri- 
terion accountants on these six scales was computed; 
this was a standard score of 43. Any officer whose 
mean standard score on these six scales was 43 or 
above was classified in the Like group. The lower 
cutting score was set at one standard deviation be- 
low the mean standard score of the criterion group 
of accountants on the six scales. This resulted in 
all the profiles with a mean standard score of 33 or 
less on these six scales being classified in the Unlike 
group. Those cases between the high and low cut- 
ting score were considered “Indeterminate.” 

Similarly, by using the mean standard score pro- 
file of Strong’s Personnel Director criterion group, 
upper and lower cutting scores were established on 
two occupational scales: Personnel Director and Pub- 
lic Administrator. These two scales were chosen for 
the personnel group because they are the scales on 
which Strong’s civilian personnel directors scored 
highest on the average. The Like and Unlike groups 
of personnel officers were obtained by the same pro- 
cedure described for the accountant group but with 
respect to the Personnel Director and Public Ad- 
ministrator Scales. Again, the cases between the 
high and low cutting scores were considered as “In- 
determinate.” Table 2 shows the resulting classifi- 
catidn of the 1,398 profiles. 

Strong designated two’ primary uses for his inter- 
est inventory, and consequently has used a different 
criterion for the evaluation of each. First, he pro- 
posed that men engaged in occupations have charac- 
teristic interest patterns that differentiate them from 
other occupations. Strong offers a wealth of data 
(3, Chaps. 7-9) to support the validity of this use 
of SVIB. The second use of the SVIB was to pre- 
dict the “satisfactory occupational adjustment” of a 
man. Interest research workers have generally used 
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two factors, “continuance in an occupation” and 
“expressed vocational preference,” to evaluate this 
latter use of the SVIB. The criterion of “continu- 
ance in an occupation” has the disadvantage that, 
considering personal and economic pressures, not all 
who desire to leave or to enter a vocation are free 
to do so. “Expressed vocational preference” has the 
disadvantage of the instability common to most such 
single statements. 

Two items in this study, choice of Air Force duty 
and choice of a future civilian occupation, would 
seem to be particularly stable “preference” factors 
because: (a) the men are mature, as evidenced by 
a median age of 41.5 years for the comptroller group 
and 34.8 years for the personnel group; (b) the men 
are familiar with their occupations, as shown by a 
median of military comptroller experience of 3.2 
years and 3.6 years for the personnel group, in ad- 
dition to whatever civilian experience they might 
have had; and (c) the two items combined required 
the subject to project himself over two different life 
and work situations, duty in the Air Force and a 
subsequent civilian job. These factors seem, in con- 
trast to other interest studies discussed by Strong 
(3, p. 389), to provide a more stable criterion of in- 
terest measurement than such a measure as the vo- 
cational choice of an adolescent high school or col- 
lege student who is relatively unfamiliar with the 
myriad of occupations in the world of work. 

The adopted procedure, then, was to determine the 
relationship between the two preference items (Air 
Force duty and choice of civilian occupation) and 
measured interest patterns. 


’ Results and Discussion 


In response to the personal history item, 
“If you had your choice of Air Force duty, 
which duty would you choose?”, 64 per cent 


87 


of the total comptroller sample and 53 per 
cent of the total personnel sample expressed 
a preference for duty in their present occupa- 
tional field. Table 3 shows that comptrollers 
with interest patterns like successful civilian 
accountants stated a preference for the comp- 
troller occupational field much more fre- 
quently than those with interest patterns un- 
like successful civilian accountants. This dif- 
ference is significant at the .0001 level. 
Within the personnel group the relationship 
between “choice of Air Force duty’ and 
“Like-Unlike” profile classification was not 
statistically significant. 

In response to the personal history item, 
“If you were released from the Air Force, 
what civilian occupation would you like to be 
engaged in?”, 39 per cent of the total comp- 
troller sample and 36 per cent of the total 
personnel sample expressed a preference for 
their present occupation if released from the 
Air Force. Table 3 shows that the Like 
groups (both comptroller and personnel) state 
a preference for their present occupation much 
more frequently than the Unlike groups. 
Both differences are significant at the .0001 
level. 

Table 4 shows the relationship between the 
two preference items combined and measured 
interest patterns for both the comptroller and 
personnel groups. 

The results do not attest to the adequacy 
of the combined criterion. What they do in- 


Table 3 


Percentage Relationships and Probability Levels Between Like and Unlike Groups and Two Vocational 


Preference Items for Comptroller and Personnel Officers 





Total 
Sample 
(N = 243) 


Comptroller Group 





Like 
Group 
(N = 78) 


Unlike 


roup Probability 
(N = 57) 


Level 





Choice of Air Force duty 

(percentage choosing comptroller duty) 
Choice of Civilian Occupation 
(percentage choosing comptroller occup.) 


64.0 


39.0 


80.8 40.3 P < .0001 


61.5 





Personnel Group (N = 1155) 


(N = 464) 





Choice of Air Force duty 

(percentage choosing personnel duty) 
Choice of Civilian Occupation 
(percentage choosing personnel occup.) 


53.0 


36.0 
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Table 4 





Relationship of Interest Profiles to a Combined Criterion 





Total 


Description Sample 


Like 
Group 


Unlike 
Group 


Probability 
Level 





Percentage of Comptroller sample choosing 
Comptroller for both Air Force duty and 
civilian occupation 

Percentage of Personnel sample choosing 
Personnel for both Air Force duty and 
civilian occupation 


33.3 


26.8 


dicate is that if the combined criterion can be 
presumed to be adequate on the basis of the 
preceding discussion, there is evidence for the 
validity of the SVIB in predicting “satisfac- 
tory vocational adjustment” for the military 
occupational population represented by this 
sample of officers. 


Summary and Conclusions 


This investigation attempted to discover 
the relationships between measured interest 
patterns (SVIB) and satisfactory vocational 
adjustment for Air Force officers in the Comp- 
troller-Accountant and Personnel Fields. The 
conclusions which seem warranted on the ba- 
sis of the research findings are: 

1. Strong Vocational Interest Blank reflects 
the degree of satisfactory vocational adjust- 
ment for Air Force officers in the comptroller 
field. This is shown in three ways. A sig- 
nificantly larger proportion of the group with 
measured interests similar to those of suc- 
cessful accountants in business and industry 
state: (a) a preference for Air Force duty in - 
the comptroller specialties; (6) a preference 
to engage in comptroller occupations in civi- 
lian life if released from Air Force duty; and 
(c) a preference to engage in comptroller oc- 
cupations for both Air Force duty and civil- 
ian occupation—as compared with the group 
whose measured interest patterns are not 
similar to successful civilian accountants. 

2. The Strong Vocational Interest Blank re- 
flects the degree of satisfactory vocational ad- 
justment for Air Force officers in the person- 


11.5 18.2 P < 001 


nel field. This is shown in two ways. A sig- 
nificantly larger proportion of the group with 
measured interests similar to those of success- 
ful personnel directors in business and indus- 
try state: (a) a preference to engage in per- 
sonnel occupations in civilian life if released 
from Air Force duty; and (0) a preference to 
engage in personnel occupations for both Air 
Force duty and civilian occupation—as com- 
pared to the group whose measured interest 
patterns are not similar to successful civilian 
personnel directors. 

3. Such evidence justifies the conclusion 
that measured interests should receive in- 
creased emphasis as a factor in military se- 
lection and classification procedures for Air 
Force officer specialists. 


Received April 5, 1937. 
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A major objective of the forced-choice scale 
is the control of transparency and hence of 
biasability. The vehicle of control is the 
equivalence of general desirability which ob- 
tains for items within a set (pair, triad, etc.). 
Since alternatives are equally favorable, S$ is 
presumably deprived of the opportunity to 
describe himself in a consistently favorable 
manner. Several studies (3, 6, 8) have shown 
that this is a tenable assumption; Ss do not 
improve their scores on forced-choice scales 
when instructed to describe themselves in the 
most favorable light. However, Gordon (4) 
finds that gains are made on two scales of the 
Gordon Personal Profile when scores from 
guidance and employment conditions are com- 
pared, while scores on the two remaining keys 
decrease. This suggests that S’s motivation 
in an employment situation may operate more 
specifically than a tendency to describe him- 
self favorably. A given situation may sug- 
gest that certain qualities are relevant, some 
of which may relate to keys of the inventory. 
Two studies on the Jurgensen Classification 
Inventory (8, 9) show that when Ss are 
asked to describe themselves as self confident, 
scores on a self confidence key increase sig- 
nificantly. While the mean score on this key 
does not increase when Ss are asked to as- 
sume that they are taking the test as part 
of a selection procedure, the correlation be- 
tween guidance and selection sets is but .50, 
indicating that scores change, some upward, 
some downward (8). For the purposes of 
selection, any change from the true score is 
undesirable. 

The present study investigates the effect of 
introducing knowledge of an employer’s ob- 
jectives into an “assumed selection” situation, 
where this knowledge is relevant to some 


1 Some of the data of this study were presented at 
the 1957 meeting of the Midwestern Psychological 
Association. 

2 The author wishes to thank Sarah Ann Beltz for 
performing most of the computations. 


scale of the inventory employed. Like other 
studies which utilize “assumed selection” sets, 
the study may be criticized as lacking realism 
and consequently as lacking relevance. The 
evidence for a defense consists of the specific 
instructions given and the reports of Ss con- 
cerning their behavior. 


Procedure 


The Ss were 46 junior men in a college of engi- 
neering. Their participation in a two-hour testing 
session partially fulfilled a requirement of the in- 
troductory psychology course. Each S completed the 
Ghiselli Self Description Inventory under each of 
seven conditions. This scale, which is described else- 
where (1), consists of 64 pairs of adjectives, half of 
the pairs presenting two favorable terms, the other 
half presenting two unfavorable terms. Items re- 
ceive weighted scores on five empirically derived 
keys. The intelligence key is composed of 36 items, 
with a maximum score of 70; initiative, 17 items 
and 51 points; self-assurance, 31 items, 48 points; 
supervisory qualities, 24 items and 54 points; occu- 
pational level, 20 items and 65 points. About half 
of the items of the inventory are scored on more 
than one key. The following instructions were 
given: 

“You will be asked to complete this inventory 
several times. Later in the period, I will answer 
questions about the procedures, but for the mo- 
ment I will ask you simply to listen to the instruc- 
tions and follow them as closely as possible.” 

Set I. “Read the instructions at the top of the 
page. I would like you to describe yourself as ac- 
curately as possible using the pairs of adjectives in 
the manner indicated.” Upon completion of each 
set, answer sheets were collected. 

Set II. “I would like you to assume that you are 
applying for a job in which you have some interest. 
As one part of the selection procedures, you are 
asked to complete this inventory. Assume that other 
test scores will be considered, along with your col- 
lege transcript, interview report, letters of recom- 
mendation, etc. The organization to which you are 
applying advertises that it is looking for young men 
with initiative. I realize that you are not in the 
situation described; I am asking you to imagine that 
you are, and to act as you believe you would.” 
(The question was asked, “Do you mean that we 
should cheat?” The answer given was, “I do not. 
I have no way of knowing what you would do in 
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this situation. I am asking you to decide what you 
would do, and then to do it. Remember, this in- 
ventory is but one part of the selection apparatus.”’) 

Set III. Same as II, with the substitution of “the 
organization is looking for intelligent people,” for 
“the organization advertises that it is looking for 
young men with initiative.” 

Set IV. Same as II and III, with the substitution 
of “the organization is looking for men with self- 
assurance.” Upon completion of Set IV, Ss were 
asked about their behavior on the three preceding 
administrations. This will be referred to later. 

Set V. “Considering all the people with whom 
you are well acquainted, select one person who, in 
your opinion, possesses the most initiative. Spend 
some time thinking about this, and decide on the 
person you would rank first among all acquaintances 
on this trait. I do not want a hypothetical possessor 
of a trait; I want a real person. Having selected 
the person, forget about the trait, and describe this 
person as accurately as you can.” 

Set VI. Same as V, but “most intelligent per- 
son” described. 

Set VII. Same as V and VI, but “most self-as- 
sured person” described. 


Analysis and Results 


All answer sheets were scored on each of 
three “relevant” keys (initiative, intelligence, 
self-assurance) and on an “irrelevant” key 
(occupational level). The resulting score 
matrices (46 Ss by seven sets) provided the 
basic data for analysis. Mean scores and 
standard deviations are presented in Table 1. 
Bartlett’s test indicated that variances were 
homogeneous, permitting an analysis of vari- 
ance. The analysis of variance is summa- 
rized in Table 2. All F’s in this table are 
significant at the .01 level. 


Since for each key the lowest score is ob- 
tained from “accurate self description,” sig- 
nificant increases are associated with some of 
the induced sets. Table 3 presents the values 
with which differences between means may 
be evaluated. Inspection of Table 1 in this 
light indicates that both classes of set (as- 
sumed selection and description of person 
possessing trait) produce such a shift. A set 
involving either intelligence or initiative re- 
sults in significant increases on all four keys. 


Discussion 


Mean scores on sets V, VI, and VII are 
essentially indirect indications of validity. 
For example, individuals seen by others as 
intelligent (Set VI) are described by adjec- 
tives which are weighted on the intelligence 
key. As a group, these individuals also score 
high in intiative, self-assurance, and occupa- 
tional level. However, the correlation be- 
tween individuals on the initiative and intelli- 
gence keys for Set VI is only .236, which 
agrees with the coefficient reported by Ghi- 
selli (3, p. 17) of .227. One might agree that 
the underlying traits are probably related to 
this extent. It is at least possible that over- 
lap between keys is of no great concern, given 
a set to produce valid descriptions. How- 
ever, we should note that “most intelligent” 
(Set VI) and “most initiative’ (Set V) Ss 
score higher on self-assurance than do “most 
self-assured” Ss (Set VII). This is clearly 
not desirable if the inventory is to have diag- 
nostic significance and suggests the reduction 


Table 1 
Means and Standard Deviations* 











Key 








Initiative 
Intelligence 
Self-Assurance 


Occupational Lev! 


7.5 





* Italicized numbers indicate the sets that are relevant for each key. 





Specific Selection Sets 


Table 2 


Summary of the Four Analyses of Variance 


Initiative 


Mean 
Square 


Mean 


Source dj Square 


Sets 6 
Subjects 45 
Residual 270 


449.45 
74.84 
36.79 


173.40 
125.46 
31.96 


Note.—P for all F's < .01, 


of overlap between the self-assurance key and 
other keys. 

Of more concern in this investigation are 
the effects associated with the assumed selec- 
tion sets (II, III, IV). These may be stated 
briefly. First, when a set is established via 
a statement about employer objectives (i.e., 
“the organization advertises that it seeks men 
with initiative”), scores increase significantly 
on a key related to the stated objective. This 
is the usual demonstration of transparency; 
if one knows what the test attempts to as- 
sess, he can influence his score. There is no 
evidence relevant to transparency in the sense 
of being able to infer what the test measures. 
Ghiselli’s data (3) suggest that this “seeing 
through” the test does not occur, and the 
present study in no way challenges this. 

Second, for Sets II and III, the bias intro- 
duced generalizes to other keys. The pattern 
here is identical with that found for the 
“valid description” sets, but in this case the 
problem would appear more serious. Re- 
gardless of whether the generality is produced 
by real correlation of the underlying traits, 
by mechanics of inventory and key construc- 
tion, or by the varying success which indi- 
viduals meet in attempting to beat the test, 
the opportunity to score high on some key 
by “accident” is a weakness in any situation 
where motivation to describe one’s self ac- 
curately is suspect. Heron’s study (5) in- 
dicates that the selection situation may be 
so described. Business organizations acquire 
reputations, intentionally and fortuitously, for 
seeking “bright young men” or “men with 
initiative”; individuals may independently ar- 
rive at similar conclusions. Given such in- 


Intelligence 


Self-Assurance Occupational Level 


Mean 
Square F 


Mean 
Square 


7.76 
4.24 


340.20 
185.83 
43.85 


256.15 
85.04 
20.37 


4.17 


formation, an applicant may obtain spurious 
scores on some key or keys. Again, as in 
Sets V, VI, and VII, the generality of score 
increases is not due entirely to high correla- 
tions between keys. Intelligence and initia- 
tive correlate .20, .43, and .42 for Sets II, 
III, and IV. 

Third, the similarity of results for the two 
classes of set employed suggests the presence 
of a general factor on which items and scales 
have varying loadings. The self-assurance 
scale presumably has less of this general fac- 
tor than have the initiative and intelligence 
scales. This similar effect of the two classes 
of set could, of course, be produced by the Ss 
employing similar operations in the two cases, 
but there is evidence that this is not the case. 
The correlations between relevant sets for 
initiative, intelligence, and self-assurance are 
— .17, .12, and .08. In addition, the discus- 
sion following Set IV suggests that the typi- 
cal procedure for the selection sets was to 
choose the alternative which appeared rele- 
vant to the trait whenever one member of a 
pair was “obviously related” or when neither 
member of a pair was seen as good self-de- 
scription by an S. No S admitted describing 
some other real person in these sets. 

Considered together, the results suggest 


Table 3 


Critical Values for Differences Between Means 








Self- Occupa- 


Intelli- 
gence 


Initia- 
tive 


Assur- tional 


ance 





1.85 
2.45 


2.32 
3.07 


P <.05 
P< 01 


2.48 
3.28 
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that the usual preference index is an insuffi- 
cient basis for building item pairs. Within 
the forced-choice format, two alternatives 
suggest themselves. One would be to equate 
items on both preference index and general 
factor loading, so that a choice could be 
treated in terms of the item’s specific vari- 
ance alone.’ A similar purpose might be 
achieved by introducing a selection set into 
the establishment of the preference index, i.e., 
by obtaining judgments to a question like 
“how favorable would this term be as a de- 
scription of a job applicant?” As _ several 
writers (6, 7, 11) have indicated, many va- 
rieties of preference index are possible. An 
adequate index is one which controls for 
sources of error known to exist. Whatever 
the value of the alternatives advanced, it is 
at least conceivable that the selection situa- 
tion may produce special requirements which 
can be met by some type of preference index. 
It is also possible, of course, that adequate 
control will be found outside the forced-choice 
approach. 


Summary and Conclusions 


1. Indirect evidence of validity is presented 
for three scales of the Ghiselli Self Descrip- 
tion Inventory. Persons viewed by others as 
possessing a trait in marked degree receive 
high scores on the scale designed to measure 
that trait. 

2. When a set is introduced which suggests 
that a company is “looking for men with 
...,” scores on the trait named increase 
significantly. In this sense, the inventory is 
transparent. 

3. Bias introduced by a specific set gener- 
alizes to other scales in the inventory. This 
is a disturbing influence in use as a selection 
instrument, since it increases the number of 
potential sources of a high score. This gen- 

8In a personal communication, Robert J. Wherry 
suggests that items be matched on preference index, 
general factor loading, and discrimination index, 
while varying on group factor loading. He currently 


employs this procedure, thereby constructing scales 
which are purely diagnostic, ignoring level. 


erality is attributed to the presence of a gen- 
eral factor which is not controlled in the pair- 
ing of items, and in part to the overlap pro- 
duced by keying some items on more than 
one scale. 

4. It is suggested that preference index 
alone is an insufficient basis for constructing 
forced-choice pairs, if biasability is to be 
minimized. Suggested alternatives which re- 
tain the forced-choice approach include match- 
ing on general factor loading, or making the 
preference index specific to the selection situ- 
ation. 


Received April 22, 1957. 
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Limitations on the Use of Strong Sales Keys for Selection 
and Counseling 
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The available Strong sales keys are fre- 
quently used in selection and counseling for 
many different types of sales positions. Their 
more or less universal validity appears to be 
generally accepted, despite cautions about the 
need for validating them in any given situa- 
tion (9). Review of the literature, however, 
furnishes little support for the belief in the 
general validity of Strong’s sales keys (4, 5). 
Aside from studies of casualty and life insur- 
ance salesmen (1, 2, 3, 7, 8) and a study of 
17 detergent salesmen (6), there is no evi- 
dence that the Strong sales keys are valid 
predictors of success in other types of sales 
positions. Further, the high intercorrelations 
(.82 to .84) among the sales keys (Life 
Insurance Salesman, Real Estate Salesman, 
Sales Manager) indicate that they are essen- 
tially measuring highly similar sales interest 
patterns (8). In view of the recognized dif- 
ferences among various sales positions, the 
limited evidence on validity suggests the pos- 
sibility that the existing Strong sales keys 
may not be suitable for use in many sales 
selection and counseling situations. 

The present study provided an opportunity 
to investigate this possibility. Earlier unre- 
ported studies had developed two custom- 
built Strong sales keys by an item analysis 
of two different types of salesinen in the same 
company. Cross-validation of these keys in- 
dicated that they were effective in selection. 
The relationships of these two valid sales 
keys to each other and to three Strong sales 
keys (Life Insurance Salesman, Sales Man- 
ager, and Group IX) were then determined. 
The results indicated the similarities and dif- 
ferences in type of sales interest patterns 
measured by these different keys. 


Description of Study 


The two types of salesmen in this study were en- 
gaged in the sale of accounting and data processing 
machines on a rental basis (DP) and the sale of 
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electric typewriters (ET) for the same company. 
Since objective criteria of sales success were found 
to be unreliable (the r between first and second year 
production based on percentage of quota was — .11 
for 89 DP salesmen), the criterion used to measure 
sales success was survival. All men who completed 
a minimum of 18 months after being assigned a sales 
territory were considered successful. All who termi- 
nated before this time because of poor performance 
were considered unsuccessful. The criterion was the 
percentage of men who terminated for each Strong 
score level. 

The present study was primarily a follow-up on 
the validity of two custom-built sales interest keys 
for new samples of salesmen. Keys for DP and ET 
salesmen had been constructed previously by an 
item analysis of each group. The DP key contained 
199 item responses (128 items), the ET key 193 item 
responses (129 items). These were items with per- 
centage differences between successful and unsuccess- 
ful salesmen significant at the .10 level or better. 
After preliminary cross-validation of these keys, 
letter grades ranging from A to D were established 
and the keys were used by the company in selecting 
DP and ET salesmen. The DP and ET cross-valida- 
tion samples used in this study were men who had 
taken the Strong test before employment and had 
been selected partly on the basis of their Strong 
scores. Since the selection procedure discouraged 
the employment of low scoring applicants, the range 
of Strong scores for these samples was considerably 
restricted. 

The total DP cross-validation sample totaled 358 
men, the total ET sample, 220. The validity of the 
DP and ET keys was determined for these groups. 
Random samples of 140 DP and 100 ET salesmen 
were then selected from these groups and the re- 
mainder of the study was carried out on the smaller 
samples. This consisted of scoring them on three 
Strong sales keys (Life Insurance Salesman, Sales 
Manager, and Group IX) and correlating these three 
keys with the two custom-built keys (DP and ET). 


Results 


Table 1 shows the relationship between the 
letter grades for the custom-built DP key and 
tern.inations from sales for the total cross- 
validation sample of 358 DP salesmen. The 
percentage separated increased from 7% for 
A’s to 31% for D’s. The differences in per- 
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Table 1 


Relationship Between Strong DP Sales Key and 
Terminations Due to Poor Performance 
for 358 DP Salesmen 








Company 
Letter 
Grade 


Terminations 
Total 
Sample No. % 





A 104 7 
B 118 8 
Cc 97 13 
D 39 31 


Total 358 





centage were significant at the .01 level by 
the chi-square test. 

Table 2 indicates the validity of the cus- 
tom-built ET key for the total cross-valida- 
tion sample of 220 ET salesmen. The per- 
centage terminated increased from 16% for 
A’s to 52% for D’s. These differences in 
percentage were significant at .02 by chi- 
square test. 

The restrictive effect of the selection stand- 
ards on the range of sales interest for 140 DP 
and 100 ET salesmen is shown in Table 3. 
On Strong’s three sales keys, few of the DP 
or ET salesmen employed scored below B +, 
the minimum letter grade generally consid- 
ered indicative of adequate interest in an oc- 
cupation (8). This restriction of range ex- 
plained why Strong’s sales keys had previ- 
ously been found not to be useful for selecting 
DP and ET salesmen. 

Table 4 gives the intercorrelations among 
the custom-built DP and ET keys and the 


Table 2 


Relationship Between Strong ET Sales Key and 
Terminations Due to Poor Performance 
for 220 ET Salesmen 








Company 
Letter 
Grade 


Terminations 








Table 3 


Distribution of Letter Grades for Life Insurance Sales- 
man, Sales Manager, and Group IX (Sales) 
Strong Keys for 140 DP and 
100 ET Salesmen 





Life Insurance 
Salesman 
Letter — — : 
Grade DP ET DP ET DP ET 


Sales Group 


Strong Manager IX 





126 §=88 135 100 
16 10 12 1 
4 - 2 
1 - . 2 
5 . 


Total 140 100 140100 140 100 


Life Insurance Salesman, Sales Manager, and 
Group IX (Sales) keys. The DP and ET 
keys were not significantly related. The cor- 
relations of the DP and ET keys with the 
three standard Strong keys, however, were all 
significantly different from zero at the .01 
level. The DP key was positively related to 
Strong’s sales keys, while the ET was nega- 
tively related to them. 


Discussion 


The results indicated the validity of the 
custom-built sales keys for DP and ET sales 
(Tables 1 and 2). In view of the restriction 
of range in these samples caused by using 
these keys in selection, the true validity was 
undoubtedly higher than shown here. Since 


Table 4 


Product-Moment Correlations Between Company and 
Published Strong Keys for 140 DP and 
100 ET Salesmen 








100 ET 
Salesmen** 
ET Key vs. 


140 DP 
Salesmen* 
DP Key vs. 





ET —.13 
DP 

Life Insurance Salesmen 52 
Sales Manager 52 
Group IX (Sales) 51 





* For 138 ds ke r cf M7 is eneontiy | different ‘team zero 
by,f test at the .05 23 at the .01 level. 

** For 98 df, anr er to is significantly different from zero by 
# test at the .05 level; .25 at the .91 level. 
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earlier efforts to validate the standard Strong 
sales keys for these same positions in the 
company had failed, this study furnished ad- 
ditional evidence of the advantage of con- 
structing custom-built keys for given sales 
positions when possible (9). In addition, 
other data not reported here showed that the 
DP key was not valid for ET sales, or the 
ET key for DP sales. Thus, even in the 
same company, the two sales positions were 
sufficiently different to require separate sales 
interest keys. 

The dissimilarity of the DP and ET keys 
was further demonstrated by the lack of a sig- 
nificant relationship between them (Table 4). 
This finding appeared reasonable because of 
differences between DP and ET sales. DP 
salesmen function mainly as accounting sys- 
tem consultants who analyze customers’ ac- 
counting problems and prepare technical pro- 
posals on methods of improving their opera- 
tions by use of accounting machines. After 
getting the order, they continue to provide 
service and technical advice to their custom- 
ers during the rental period. DP salesmen 
often spend months in closing a contract and 
their average orders are frequently quite siz- 
able. 
operate differently. Their product is less 
technical in nature, and their sales presenta- 
tion is based on a demonstration of the op- 
erating features of the electric typewriter. 
Their sales are generally made in less time 
with fewer calls, and the size of their orders 
is usually smaller than those in DP. As a 
result, they spend more time in cold canvass- 
ing for new sales prospects than DP men. 
Another difference is the length of the train- 
ing period. For DP, it covers approximately 
a year and a half, while it is only a few 
months for ET. There are thus a number of 
fairly important differences between DP and 
ET selling which could have caused the sales 
interest patterns needed for success in each 
job to differ. These differences would explain 
why earlier company studies have found that 
the DP key was not valid for ET sales and 
the ET key not valid for DP sales. 

The intercorrelations of the DP and ET 
keys with Strong’s sales keys raised some in- 
teresting points. For DP salesmen, the cor- 


The ET salesmen, on the other hand, 


relations of the valid DP key with Strong’s 
keys Group IX, Sales Manager, and Life In- 
surance Salesman were .51, .52, and .52, re- 
spectively (Table 4). The size of these cor- 
relations showed that the DP key was meas- 
uring sales interest patterns generally similar 
to those measured by Strong’s sales keys. 
DP salesmen thus appeared to be similar to 
Strong’s key standardization groups of life 
insurance salesmen and sales managers. De- 
spite this, however, Strong’s sales keys were 
not valid for DP sales, presumably because of 
the restricted range of the high sales interest 
scores of the DP salesmen (Table 3). 

The custom-built ET key, on the other 
hand, was negatively related to Strong’s keys 
(Table 4). The correlations ranged from 
— .34 for Sales Manager to — .48 for Life 
Insurance Salesman, and indicated that the 
ET sales interest pattern was quite different 
from the sales interest patterns measured by 
Strong’s sales keys. ET salesmen thus ap- 
peared to be a different type of salesman from 
the life insurance salesmen and sales man- 
agers used by Strong to construct his sales 
keys. 

These findings indicated the danger of using 
Strong’s sales keys in selecting applicants for 
a given sales job without prior validation. 
Scoring ET sales applicants on Strong’s sales 
keys would have classified them in a ranking 
negatively related to their ranking on the 
valid ET key. For example, successful ET 
salesmen would have tended to score low on 
the Life Insurance Salesman key and termi- 
nators high. 

A possible explanation of the negative re- 
lationships between the ET key and Strong’s 
sales keys may be the different sales ap- 
proaches used by ET salesmen and Strong’s 
standardization groups because of the nature 
of the product sold. The ET salesman usu- 
ally carries an electric typewriter with him on 
prospect calls, and tries to give a demonstra- 
tion of its operating features to his prospects. 
Thus, the ET sales presentation differs con- 
siderably from that used in selling life insur- 
ance, for example, where the sales arguments 
have to be made without reference to a tan- 
gible product. 

Since many other products are sold by 
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means of a similar tangible sales presenta- 
tion, the findings in this study raise doubts 
about the suitability of Strong’s sales keys 
for selecting applicants for this type of sales 
position. The validity of the standard Strong 
sales keys for different sales positions may 
thus be more limited than generally believed. 
This may partly explain why the literature 
contains little evidence of the validity of the 
Strong sales keys for other than casualty and 
life insurance sales. 

The results with the ET key further sug- 
gested that more caution might also be needed 
in using the Strong sales keys for counseling 
purposes. If the sales patterns measured by 
these keys are not suitable for all sales posi- 
tions, the use of these keys might present a 
misleading picture of the suitability of a 
counselee’s interest pattern for a number of 
sales jobs. In view of the widespread use of 
Strong’s sales keys in counseling, further in- 
vestigation of this area is needed. 


Summary 


Two custom-built Strong sales keys were 
validated on two different types of salesmen 
(N = 578) in the same company. Each key 


was valid only for the type of salesman used 
in the item analysis to construct the key. The 
absence of significant correlation between the 
two keys indicated the independence of the 
two sales interest patterns related to success 
in different sales positions in one company. 
One of the company keys was significantly 
positively related to three Strong sales keys 
(Life Insurance Salesman, Sales Manager, 


and Group IX), although the latter were not 
valid for salesmen in the company studied. 
The other key was significantly negatively 
related to Strong’s sales keys. Use of the 
Strong sales keys in selection for the latter 
sales position therefore would have resulted 
in serious misclassification of applicants. 

These findings suggested that valid sales 
interest patterns can be quite specific, and 
that the available Strong sales keys might 
give misleading results if used for selection 
and counseling purposes in some sales areas. 
Received April 29, 1957. 
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The Superiority of Gloved Operation of Small Control Knobs ' 


William Leroy Jenkins 


Lehigh University 


Wearing a glove on the operating hand 
might be expected to affect adversely the 
smallest amount of rotary movement a sub- 
ject can make on a tactual-kinesthetic basis 
(mean least turn) and also the time required 
to make discrete settings on a linear scale. 
Data pertinent to these questions have been 
extracted from a study concerned with the 
influence of a variety of factors on mean least 
turn and on the time required to make set- 
tings on a linear scale (2). 


Apparatus 


The apparatus for measuring mean least turn con- 
sists of a control shaft mounted on ball bearings 
and provided with a mirror reflectirg a beam of 
light onto a scale, thus giving a magnified measure 
of any rotation of the shaft. 

The linear scale apparatus has been previously de- 
scribed (3). It permits measurement of the time 
required to move a pointer by means of a control 
knob from the center of a linear scale to a pre- 
selected lighted insert. In the present study, inserts 
fs in. right and left of center and inserts 4 in. right 
and left of center were used. The error tolerance of 
.007 in. was determined by the width of the pointer 
(.118 in.) in relation to the width of the inserts 
(.125 in.). The control ratio was such that the 
pointer moved 1.18 in. for each complete turn of the 
control knob. 


Procedure 


Subjects were Lehigh University students who 
were paid at the current rate. In least turn meas- 
urements, S averted his head so that he could not 
see the knob, grasped the knob, turned it as little as 
possible right or left, and then released the knob. 
The turn was recorded to the nearest tenth of a de- 
gree. In making settings on the linear scale appa- 
ratus, S was given a ready signal, and a preselected 
insert was lighted. The S turned the control knob 
until the pointer was within the limits of the lighted 
insert and released the clutch. Time was measured 
from the instant of lighting-up of the insert to the 
instant of clutch-release to the nearest hundredth of 
a second. 


1 This report is based on data taken from a larger 
study under contract AF33(616)-2850 between the 
Institute of Research of Lehigh University and the 
USAF Air Research and Development Command, 
Wright-Patterson Air Force Base, Ohio. 
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Knob position, knob orientation, and knob diam- 
eter entered as independent variables in addition to 
the barehand-gloved factor. Each S made 40 least 
turns and 40 linear scale settings under each set of 
conditions and 16 to 20 Ss were used in each part 

For gloved operation, S wore an MA-1 double fly- 
ing glove consisting of an inner woolen glove and an 
outer limp leather shell. Some least turn measure- 
ments were also taken with S wearing a stiff rubber- 
covered glove such as is used in handling corrosive 
chemicals. 


Results 


Figure 1 shows the percentage differences 
of gloved compared to barehand operation in 
relation to knob diameter. Each circle rep- 
resents the difference for a group of 16 to 20 
Ss under a particular set of conditions (knob 
positions or orientations). Solid circles indi- 
cate statistically significant differences. The 
small flag on two of the circles means that 
the stiff rubber-covered glove was used. All 
others involved the MA-1 double flying glove. 

With small knob diameters, gloved opera- 
tion is consistently superior. This is true not 
only for mean least turn but also for time to 
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make settings at both distances. With the 
larger knob diameters, the consistent superi- 
ority of gloved operation disappears. In fact, 
in mean least turn, there is a hint that bare- 
hand operation may be significantly superior 
under some conditions. 


Discussion 


As a part of a broader study of the effect 
of gloves on control operation time, Bradley 
(1) used a 14-in. knob in four horizontal po- 
sitions. His procedure required S to turn the 
knob 40° within a tolerance of 2° so that a 
light went out and remained out. This is 
roughly analogous to making a setting on the 
linear scale apparatus at ;°; in. distance (57° 
knob turn) to a tolerance of .007 in. (2° 
knob turn). In two positions mean time with 
MA-1 double flying glove was slightly shorter, 
in the other two pesitions slightly longer, than 
in barehand operation. The over-all differ- 
ence was not statistically significant. The 
14-in. diameter in our study showed a sta- 
tistically significant difference in favor of 
gloved operation. 

The apparent paradox of better operation 
with gloves is not readily resolved., The 
smaller mean least turn taken alone might 
be explained in terms of slippage of the hand 
inside the glove; i.e., if the glove surface 
moves less than the hand inside it, a smaller 
measured mean least turn would result for 


the same amount of actual turn. But this 
facile explanation fails utterly to show why 
times for making settings on a linear scale 
are shorter with gloved operation—shorter 
not only with settings at ;% in. distance but 
also with settings at 4 in. distance requiring 
over three complete turns of the knob before 
the final adjustment can be made. No rea- 
sonable explanation of the shorter linear scale 
setting times seems to be forthcoming to date. 


Summary 


The least amount of turn on a tactual- 
kinesthetic basis and the time to make set- 
tings on a linear scale were studied in bare- 
hand operation and with MA-1 double flying 
glove. With small knobs, gloved operation 
was superior in both. With larger knobs, the 
superiority was lost. No ready explanation 


of the phenomena has been developed. 
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Several investigators have studied the rela- 
tionships between expressed and measured 
values through the use of self-value ratings 
and the original Study of Values (1). Their 
purposes for conducting such research varied. 
Pintner (6) and Anderson (3) attempted to 
examine more closely the apparent relation- 
ships or lack of relationships between pro- 
fessed (subjective) values and _ inventoried 
(objective) values. Stanley (7) had a simi- 
lar purpose, but improved on previous meth- 
ods of obtaining and analyzing the raw data. 
Vernon and Allport (8) used the relationships 
between expressed and measured values as a 
somewhat “unsatisfactory” indication of the 
empirical validity of their original Study of 
Values (SV). Finally, Fensterheim and Tres- 
selt (5) studied the influence of expressed and 
measured values on the perception of other 
people, and so indirectly related expressed 
and measured values to each other. 

Although the findings of these studies agree 
fairly well, certain doubts about the adequacy 
of their methodological designs prompted the 
writers to attempt a replication of previous 
_ research. Some refinements introduced in the 
present investigation are as follows: 

1. Data on measured values were obtained 
through the administration of the Allport- 
Vernon-Lindzey revision (2) of the SV. 

2. A method of self-rating which more 
closely approximates the answering and scor- 
ing procedures of the SV was utilized. 

3. Two types of self-rating sheets were 
used, and the data from both were analyzed 
separately. 

Method 
Instruments. Data on measured values were ob- 


tained from the revised SV based on Spranger’s six 
value areas. 


1 The data were collected while both authors were 
at the University of Missouri. 


Data on expressed values were obtained from two 
sources: A definitional rating sheet (DR) and an oc- 
cupational rating sheet (OR). Both rating sheets 
were identical in direction and format except for 
their explanations of Spranger’s six value areas. Each 
value scale extended from “1” to “9” points with 
respective labels of “lowest possible,” “extremely 
low,” “very low,” “low,” “average,” “high,” “very 
high,” “extremely high,” and “highest possible.” The 
directions were as follows: 

“Use the above Value Rating Sheet to indicate how 
much you value (prize) ‘theoretical,’ ‘economic,’ 
‘aesthetic,’ ‘social,’ ‘political,’ and ‘religious’ qualities 
for yourself. In other words, indicate the degrees 
to which you would like these six characteristics to 
describe you. Below are explanations of each value 
category. 

“In order to mark your rating for each category, 
cross out with an ‘X’ the most appropriate number 
in each of the six columns. Be sure that all six rat- 
ings add up to 30. If the sum of the ‘numbers 
crossed out’ does not add up to 30, then alter the 
ratings without falsifying them so it will.” 

These instructions involve a method of answering 
and scoring which is similar to that inherent in the 
SV. In other words, forced-choice answers are re- 
quested, and the final results are expressed in terms 
of continuous and possibly tied measures. This lat- 
ter procedure seems to be an improvement over 
Stanley’s (7) method of having Ss rank themselves 
without ties on the value scales. 

The DR explanations for each value area were as 
follows: 


Theoretical. The theoretical man is interested pri- 
marily in the discovery of truth, ie. thinking & 
knowing. He most highly values being rational, 
logical, critical, and intellectual. 

Economic. The economic man is interested pri- 
marily in what is useful, ie., survival & efficiency. 
He most highly values being practical, industrious, 
businesslike, and wealthy. 

Aesthetic. The aesthetic man is interested pri- 
marily in form and harmony, ie., beauty & loveli- 
ness. He most highly values being sensitive, expres- 
sive, artistic, and appreciative of attractive appear- 
ance. 

Social. The social man is interested primarily in 
the love of people, i.e., sympathy & service. He most 
highly values being tender, kind, helpful, and un- 
selfish. 
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Table 1 


Group Correlations Between Raponssad and Measured Values 








Self-Ratings X SV Scores 


Self-Ratings X 
Self-Ratings 





Vernon & 
Allport* 
N = 48 


Pintner 


SV Scale N = 187 


Stanley DR OR 
N = 66 N = 76 


DR X OR 
N = 76 
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* Vernon and Allport iy p. 245) averaged five external ratings with the self-rating for each subject. 


* Significant at .05 level 
** Significant at .01 level. 


Political. The political man is interested primarily 
in power, ie., might & control. He most highly 
values being strong, influential, authoritative, and 
renowned. 

Religious. The religious man is interested pri- 
marily in the unity of world outlook, ie., ultimate 
belief & understanding. He most highly values be- 
ing mystical and comprehending life’s wholeness, final 
purpose, and deepest meaning. 


The OR explanations for each value area were as 
follows: 


Theoretical. The theoretical man would most like 
to be either a scientist, mathematician, or philosopher. 

Economic. The economic man would most like to 
be either a banker, businessman, or sales manager. 

Aesthetic. The aesthetic man would most like to 
be an artist in the field of either literature, painting, 
music, or architecture. 

Social. The social man would most like to be 
either a doctor, nurse, or social welfare worker. 

Political. The political man would most like to be 
either a politician, civic leader, or military com- 
mander. 

Religious. The religious man would most like to 
be either a clergyman, religious worker, or active 
church member. 


Subjects. Ss ranged in age from 18 to 49 with a 
median of 23. They were divided into two groups 
according to the administration sequences of the 
three instruments. Forty-five Ss (28 males and 17 
females), comprising practically the total population 
of three undergraduate psychology courses during 
the 1955 summer session at the University of Mis- 
souri, made up Group A. The administration se- 
quence of instruments for that group was SV-DR- 
OR. Thirty-one Ss (26 males and 5 females) in a 
general psychology course during the 1955 summer 
session at the University of Missouri made up Group 
B. The administration sequence of instruments for 


that group was OR-DR-SV. The investigation is 
based, therefore, on a total N of 76. 


Results 


Two major approaches to studying the 
relationships between expressed (subjective) 
and measured (objective) values are dis- 
cernible in previous investigations. Certain 
results of this study will be compared with 
the findings of the other researchers under 
subheadings characterizing these two methods 
of investigation. 

Group consistency method. In this method, 
a correlation coefficient between expressed and 
measured values is calculated for cach of the 
six SV scales using all Ss in each computa- 
tion. This procedure reveals the degree to 
which a given group’s professed values cor- 
respond to its inventoried values. Table 1 
reveals that the product-moment correlations 
obtained by the present investigators substan- 
tially agree with the rank-order and product- 
moment correlations obtained by previous in- 
vestigators, except on the Social and Political 
scales in which present correlations are con- 
siderably higher. The correlations reported 
by Pintner (6) and Stanley (7) appear to be 
similar, while those found by Vernon and All- 
port (8) and the present investigators seem- 
ingly coincide. The one exception in the lat- 
ter comparison is the correlation for the So- 
cial value. 

Through the use of Z transformations and 
two-tailed tests of significance as suggested 
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Table 2 





DR Intra-Individual Coefficients Transformed to Z 


Females 
N = 20 


Variable 


Age 

SD of SV 
SD of DR 
Theoretical 
Economic 
Aesthetic 
Social 
Political 
Religious 


* Significant at .05 level. 
** Significant at .01 level. 


by Edwards (4, pp. 131-132), no statistically 
significant difference was found between group 
consistency coefficients based on the DR and 
those based on the OR. 

Intra-individual consistency method. In 
this method a correlation coefficient between 
expressed and measured values is calculated 
for each subject using all six SV scales in each 
computation. This procedure reveals the de- 
gree to which a given individual’s professed 
values correspond to his inventoried values. 
The product-moment correlation coefficients 
between DR markings and SV scores range 
from + .44 to .83 with a median r of .46, 
while the coefficients between OR markings 


The Relationships Between Z Transformed DR Intra-Individual Coefficients and Other Variables 
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y 
and SV scores range from — :54 to .86 with 
a median r of .54. While Fensterheim and 
Tresselt (5) reported higher intra-individual 
coefficients—one third of theirs were above 
.82—their median of .66 is roughly similar to 
the medians obtained in this study. Stanley 
(7) found generally lower intra-individual co- 
efficients, but his range of — .81 to .98 is 
similar to Fensterheim and Tresselt’s results, 
and his median of .39 is perhaps within an 
equivalent range of all. 

Intra-individual consistency and other vari- 
ables. In order to test expressed-measured 
consistency as it relates to other variables, all 
intra-individual coefficients were correlated 


Table 3 


The Relationships Between Z Transformed OR Intra-Individual Coefficients and Other Variables 








OR Intra-Individual Coefficients Transformed to Z 





Females 
N= 20 


Variable 


Group B 
N = 31 


Group A 
N = 45 





Age 
SD of SV 
SD of OR 
Theoretical 
Economic 
Aesthetic 
Social 
Political 
Religious 


22 


.54* 


37 
23 
32 
32 
33 
18 
11 


—.27 
35° 
.25 
15 


AS 





* Significant at .05 level. 
** Significant at .01 level. 
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Table 4 


Group Correlations Between Expressed and Measured 
Values for ‘“‘High Religious” and 
“Low Religious’ Males 








DR Self-Ratings X 
SV Value Scores 
“High 

Religious” 
Males* 
N=14 





“Low 
Religious” 
Males? 


SV Scale N=14 





Theoretical 188 .389 

Economic 408 .778** 
Aesthetic .107 774"* 
Social .097 .670** 
Political 360 .789** 


Religious 372 504 


* Range of SV Religious scores = 44-64, 
» Range of SV Religious scores = 15-32. 
** Significant at .01 level. 


with Ss’ respective ages, SV and DR (or OR) 
standard deviations, and SV value scores. 
However, the authors followed Stanley’s (7) 
use of Z transformations (4, pp. 126-127) to 
normalize the negatively skewed distribution 
of intra-individual coefficients. Tables 2 and 
3 give the product-moment correlation coeffi- 
cients by sex and group between the Z trans- 
formed intra-individual coefficients and the 
listed variables. 

Intra-individual coefficients fail to reveal a 
statistically significant relationship to chrono- 
logical age. This finding is somewhat at vari- 
ance with Anderson’s observation that “the 
results reflect a maturity factor in that the 
self-rankings of the older group were in 
closer agreement with their score ranks than 
were those of the younger group” (3, pp. 354— 
355). But the finding is in line with the .14 
coefficient reported by Stanley (7). The 
present data also parallel Stanley’s finding of 
@ positive and statistically significant correla- 
tion between intra-individual coefficients and 
SV standard deviations. This same relation- 
ship apparently holds to a lesser degree when 
DR and OR standard deviations are corre- 
lated with intra-individual coefficients. 

Additional results show that the intra-in- 
dividual coefficients and the Theoretical scores 
for males are positively and significantly re- 
lated regardless of whether the DR or OR is 
used as the self-rating sheet. Also, the rela- 


tionship between intra-individual coefficients 
and Social scores for females is positive and 
statistically significant when the DR is the 
self-rating sheet (.55), and approaches sig- 
nificance when the OR is the self-rating sheet 
(.33). 

Interestingly, males with high agreement 
between their expressed and measured values 
tend to score lower on the Religious scale of 
the SV than males with low agreement. The 
relationship is statistically significant only 
when the DR is the self-rating sheet, but a 
trend is also apparent even when the OR is 
used. In view of this finding an additional 
analysis was made. All male Ss were ar- 
ranged according to their SV Religious scores. 
Table 4 reveals the product-moment group 
consistency coefficients for both the “high re- 
ligious” (upper one fourth) males and “low 
religious” (lower one fourth) males. In all 
six value areas the group consistency coeffi- 
cients for the “high religious’ males are 
lower than those for the “low religious’ males. 
However, through the use of Z transforma- 
tions and two-tailed tests of significance as 
suggested by Edwards (4, pp. 131-132), the 
two groups were found to differ significantly 
only in the Aesthetic and Social areas (.05 
level). 

From the information contained in Tables 
2 and 3, subgroups were compared through 
the use of Z transformations and two-tailed 
tests of significance. In Table 2 males and 
females differ significantly on the Social scale 
(.05 level) and approach significance on the 
Economic scale (.06 level), whereas in 
Table 3 they differ significantly on both the 
Theoretical (.05 level) and Aesthetic (.05 
level) scales. In other words, when the re- 
sults from both tables are considered together, 
intra-individual coefficients tend to have a 
higher positive relationship to males’ Theo- 
retical and Economic scores than to females’. 
And similarly, these same coefficients tend to 
have a higher positive relationship to fe- 
males’ Aesthetic and Social scores than to 
males’. Therefore, certain sex differences 
were revealed in the present study. How- 
ever, no statistically significant difference was 
found between intra-individual coefficients 
based on the DR and those based on the 
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OR. Similarly, no statistically significant 
difference was revealed between results based 
on Group A Ss and results based on Group B 
Ss. 


Discussion 


The results of this study suggest that a 
high positive relationship exists between ex- 
pressed and measured values for most stu- 
dents, but this relationship is not sufficiently 
high to make the two interchangeable. Nev- 
ertheless, for a few students expressed and 
measured values seem to reveal almost identi- 
cal results. Thus, the problem for future re- 
search is to isolate those factors which may 
determine when the relationship between ex- 
pressed and measured values will be near per- 
fect correspondence, mediocre similarity, or 
complete reversal. In the present study three 
variables seem to be so related: variability on 
the SV, DR, and OR; religious emphasis on 
the SV; and sex differences on the SV. 

As mentioned previously, students with 
large variation in their SV scores tend to have 
more similar expressed and measured values 
than those with small variation in their SV 
scores. This same relationship holds to a 


lesser degree for variability on the self-rating 


sheets. Of course, these results are relevant 
only if little or no “contamination” of data 
occurs when standard deviations are corre- 
lated with intra-individual coefficients based 
partially on these standard deviations. The 
authors advance the interpretation that stu- 
dents’ response-sets to answer value-state- 
ments at a less variable level may reveal an 
insensitivity to, an uncertainty in, or an un- 
awareness of their value-systems. There- 
fore, variability would indicate higher self- 
sensitivity, greater self-certainty, or clearer 
self-awareness. Additional research may give 
evidence for the plausibility of this inter- 
pretation. 

Concerning religious emphasis, the more re- 
ligious male students are (according to their 
Religious SV score), the less similarity they 
tend to display between their expressed and 
measured values. In addition, since all group 
consistency coefficients of “high religious” 
males are lower than those of “low religious” 
males (even though the differences are sta- 
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tistically significant in only two value areas), 
perhaps the former group is really less aware 
of its value-system and value-emphasis than 
the latter. But another possible interpreta- 
tion is that highly religious students may not 
represent a homogeneous population. A sub- 
group within highly religious students actu- 
ally may have almost identical expressed and 
measured values, but in the present study its 
members were in the minority. If such were 
the case, these students might be distinguish- 
able by their really being religious instead of 
just wanting to seem religious, or by their 
using religious zeal as an expression of per- 
sonal ideals instead of as a defense against 
personal problems. Further research may 
not only confirm the present finding but also 
may substantiate one of the above interpreta- 
tions. 

The significant and near significant sex dif- 
ferences in Tables 2 and 3 give rise to an 
interesting hypothesis. Men who score high 
on the Theoretical, Economic, and Political 
scales of the SV, the so-called “masculine” 
values (1, 2, 8), tend to have more similar 
expressed and measured values than those 
who score low on these scales. Likewise, this 
same trend is apparent for women who score 
high on the Aesthetic, Social, and Religious 
scales of the SV, the so-called “feminine” 
values. In fact, males and females who score 
high on values attributed to their own sex 
and score low on values attributed to the op- 
posite sex seemingly have closer expressed 
and measured values than individuals in the 
reversed situation. But since this interpreta- 
tion is only apparent as a trend in the data, 
further research utilizing different procedural 
and statistical analyses is indicated. 

An additional result is the lack of any 
statistically significant difference between the 
two self-rating instruments, even though the 
DR used abstract definitions and the OR 
concrete occupational titles. Nevertheless, as 
merely a general trend, the DR appears to 
be more sensitive to intra-individual consist- 
ency relationships, whereas the OR seems to 
be more sensitive to group consistency rela- 
tionships. Further research specifically in- 
vestigating these two types of self-rating 
sheets is necessary. 
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Even though the authors failed to find a sig- 
nificant relationship between intra-individual 
coefficients and chronological age, this failure 
may be due to inappropriate methodology. 
Perhaps chronological age is highly related to 
intra-individual coefficients when one person 
is compared at different times, but not when 
different people are compared at one time. 
Appropriate methodology would then require 
the study of the same Ss over a period of 
years. Future research of the longitudinal 
variety is therefore recommended. 

As a replication, the present investigation 
corroborates most of the findings reported in 
previous studies. Nevertheless, methodologi- 
cal refinements—particularly the administra- 
tion of the revised SV in conjunction -with 
a sounder way of obtaining self-ratings of 
values—resulted in a marked increase in 
group consistency coefficients for the Social 
and Political scales. Apparently the restand- 
ardization of the Social scale on the revised 
SV brought the value-inventory more in line 
with the value-rating in regard to what they 
both “get at.” But since the increased cor- 
relation for the Political scale cannot be ex- 
plained entirely on the same basis, perhaps 
the generally higher relationships in this study 
may be due also to improved self-value-rat- 
ing-instruments. 


Summary 


This study, in the order of a combined 
replication of a number of earlier ones, intro- 
duced methodological improvements in the 
investigation of the relationships between ex- 
pressed and measured values. Through the 
administration of the revised Study of Values 
and two self-rating sheets (one using defini- 
tions of the six Study of Values scales, the 
other utilizing related occupational titles), 
data on 76 Ss were obtained and analyzed for 
relationships. 

On the basis of group and most intra-indi- 
vidual correlations, Ss seem to have a rela- 
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tively significant awareness of their measured 
values. Nevertheless, individuals vary con- 
siderably in the similarity between their ex- 
pressed and measured values—from near per- 
fect correspondence to complete reversal. 

The analysis clearly points out that the 
more students vary in their scores on the 
Study of Values, the more similar their ex- 
pressed and measured values tend to be. To 
some extent this is also true for variability 
on the self-rating sheets. In addition, the 
higher male students prize religious values 
as measured by the Religious scale of the 
Study of Values, the less similar their ex- 
pressed and measured values tend to be. On 
the other hand, male and female students who 
score high on values attributed to their own 
sex and score low on values attributed to the 
opposite sex seemingly have closer expressed 
and measured values than students in the 
opposite situation. 

Certain areas where future investigations 
might contribute were also noted. 


Received April 22, 1957. 
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Recent papers have stressed the importance 
of obtaining information on self-perceptions 
for an increased understanding of the psycho- 
logical aspects of industrial organization and 
functioning (2, 4). Self perceptions are a 
function of both the individual’s enduring 
traits and his present social roles. Thus, 
when a person describes how he perceives 
himself, he must to some extent use his pres- 
ent social environment as a reference point to 
which he can relate his own behavior and his 
own personality traits. For any employed per- 
son, the work environment is an important 
segment of his total social environment, and 
therefore, self-descriptions “would appear to 
be of special interest since they provide cues 
as to the place the individual sees for himself 
in the organization and to the manner in 
which he sees himself functioning” (4). 

Traditionally, management personnel and 


line workers have been the two major groups 
in an industrial organization that have been 
contrasted with each other in consideration 
of the personnel aspects of industrial opera- 


tions. For this reason, a comparison of the 
self-perceptions of management personnel vs. 
the self-perceptions of line workers would 
seem especially relevant for understanding the 
psychological problems of the work situation. 

The importance of perception in labor-man- 
agement relations has been suggested by a 
number of writers. A study of role percep- 
tions by Haire, for example, illustrates the 
striking effects of differences in perception of 
a neutral person when he is labeled as either 
a union official or a management official, and 
when the perceivers are either workers or 
members of management (3). Haire con- 
cluded from his data that “the general im- 
pression of a person is radically different 
when he is seen as a member of management 
than when he is seen as a representative of 
labor,” and that “management and labor each 
sees the other as less dependable than him- 
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self . . . less appreciative of the other’s po- 
sition than he himself is . . . [and] deficient 
in thinking, emotional characteristics, and in- 
terpersonal relations in comparison with him- 
self” (3, p. 211). Since self-perceptions are 
a factor in the role perceptions of others, as 
well as in the role perceptions of one’s own 
group, a study of how line workers view 
themselves as contrasted with how manage- 
ment personnel view themselves should help 
in interpreting the behavior of labor and 
management groups in wage negotiations and 
other labor relations situations. 

At the same time, such data should pro- 
vide a check on whether individuals in these 
groups see themselves in accordance with 
some of the traditional characteristics at- 
tributed to them by others and even by some 
members of their own group. In other words, 
it is relevant to ask whether line workers lock 
at themselves the same way others are prone 
to think about them. A similar question can 
be asked in regard to management personnel. 

The purpose of this study is to compare the 
self-perceptions of line workers and manage- 
ment personnel employed in a variety of dif- 
ferent industries. 


Method and Procedure 


The instrument used in this study to obtain the 
self-descriptions was a 64-pair forced-choice adjec- 
tive check-list developed by Ghiselli and used in 
previous studies (1, 2, 4). Thirty-two of the pairs 
involve adjectives which are descriptive of different, 
but desirable, social traits. The S is asked to select 
the adjective in each pair that he believes best de- 
scribes himself. The other 32 pairs involve adjec- 
tives descriptive of socially undesirable traits, and 
S must choose the word in each pair that is least 
characteristic of himself. ‘If individuals in two 
groups (e.g., line workers and management person- 
nel) check the list, it is possible to discern descrip- 
tive patterns that distinguish the two groups. 

The self-description inventory was filled out by 
463 management personnel and 320 line workers. 
For the purpose of this study, “management person- 
nel” are defined as those who have any supervisory 
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Items Differentiating Management Personnel 
and Line Workers 








Management Personnel 


Line Workers 





See themselves as: 
inventive 
loyal 
resourceful 
clear-thinking 
sincere 
fair-minded 
responsible 
dignified 
imaginative 
logical 


See themselves as: 


cooperative 
dependable 
planful 
efficient 

calm 
thoughtful 
reliable 
civilized 
self-controlled 
adaptable 





Do not see themselves as: 


immature 
affected 
cold 
infantile 
intolerant 
foolish 

weak 

rude 
rattle-brained 
submissive 
self-pitying 
cynical 
dissatisfied 
sly 
irresponsible 


Do not see themselves as: 


quarrelsome 
moody 
stubborn 
conceited 
nervous 
careless 
selfish 
self-centered 
disorderly 
fussy 
hard-hearted 
aggressive 
outspoken 
excitable 
impatient 





duties. Thus, first line supervisors and foremen are 
included in the management group. All those who 
have no supervisory duties, ie., those in the lowest 
level of the organization, are classified as “line 
workers.” Both the management and line individu- 
als are from a number of different organizations of 
different sizes and located in widely scattered geo- 
graphical areas throughout the country. The inven- 
tory was administered ordinarily in connection with 
some sort of a personnel audit, rather than in a 
strictly research connection. 


Results 


The responses of the individuals in the two 
groups were analyzed for each of the 64 pairs 
of adjectives. Twenty-five, or slightly over 
one-third, of the items differentiated the two 
groups at the .05 level of confidence or better. 
Ten of the pairs were composed of favorable 
adjectives, and the other 15 pairs of unfavor- 
able adjectives that “least describe” the in- 


dividual. The 25 differentiating pairs are 
presented in Table 1, where the responses of 
management personnel are given in the left- 
hand column, and those of the line workers 
in the right-hand column. It should be noted 
that when the results are presented in this 
manner, the differences are relative, and do 
not necessarily indicate that one adjective in 
a pair was favored by the majority of one 
group and the other adjective in the pair by 
a majority of the other group. A majority 
of both groups may have favored the same 
adjective in a pair, but one group favored it 
significantly more often than the other group. 
For example, for the first pair of adjectives 
listed in Table 1, a majority of both groups 
favored “cooperative,” but the line workers 
selected it relatively more often than manage- 
ment personnel, the percentages being 87.2 
for line and 78.6 for management. In other 
words, proportionately more line workers saw 
themselves as “cooperative” than did man- 
agement personnel, even though a majority of 
both groups would describe themselves as 
“cooperative.” It also must be noted that 
when a person checks one word in a pair on 
a forced choice inventory such as used in this 
study, he is not necessarily rejecting the 
other word. He is only indicating that one 
adjective is more or less descriptive of him 
than the other adjective. 

The two lists given in Table 1 provide evi- 
dence for reasonably well-integrated pictures 
of the two groups. Management personnel, 
as contrasted with line workers, perceive of 
themselves in terms of leadership qualities. 
They picture themselves as possessing traits 
that are ordinarily associated with leadership 
behavior. They see themselves as strong and 
relatively dominant types of individuals in 
comparison with how members of the line see 
themselves. The management personnel de- 
scribe themselves as possessing a good degree 
of initiative and independence of thought and 
action. Further rounding out the picture of 
leadership qualities are the selection of self- 
descriptive traits that imply maturity and 
fairness. Throughout, the members of man- 
agement relatively more often choose adjec- 
tives that describe themselves as being “fair- 
minded,” “responsible,” and not being “im- 
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mature,” “intolerant,” “irresponsible,” and 
“infantile.” They look at themselves as be- 
ing the type of people who have qualities of 
leadership and who know how to exercise 
them appropriately. They also indicate a 
concern with appearing as straight-forward 
people in their dealings with others. 

The self-descriptions of the line workers 
present a contrasting picture to that of the 
management self-descriptions. The line work- 
ers relatively more often checked adjectives 
that would place them more towards the “fol- 
lower” end of the leadership dimension. Es- 
sentially, they see themselves as being ca- 
pable, steady, and agreeable types of indi- 
viduals. The picture is one of “nice guys” 
who can be depended upon, and who will not 
try to cause trouble. A number of the ad- 
jectives that they chose relatively more often 
than management closely fit together to ob- 
tain this picture: i.e., “cooperative,” “depend- 
able,” “reliable,” and “self-controlled,”’ and 
not “quarrelsome,” not “stubborn,” not “ag- 
gressive,’ and not “outspoken.” Socially, 


they, like management, view themselves as 
being concerned about the rights of others 
and of not appearing to be egocentric. In 


comparison to management personnel, fewer 
of them picture themselves as likely to be 
irritable and easily upset. They seem to feel 
that they have a closer control over their 
emotions than do members of management. 
In summary, the line workers describe them- 
selves as even-tempered individuals who can 
be relied upon to perform quite adequately 
in cooperation with other individuals. 


Discussion 


When the over-all self-perceptions of the 
two groups are contrasted with each other, 
what seems to emerge is that management 
personnel have pictured themselves in a way 
that closely fits a “leader” stereotype, while 
line personnel give the complementary pic- 
ture of a “follower’’ stereotype. The particu- 
lar 25 pairs of traits that distinguish between 
the two groups concern items that are typi- 
cally thought to be especially associated with 
leadership and followership. Consistently 
throughout the 25 pairs, management per- 
sonnel select traits more nearly towards the 
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leadership end of the dimension, and line per- 
sonnel select traits that put them towards 
the followership end. 

The self-perception description of manage- 
ment people is one that might be expected if 
management’s role is conceived as primarily 
a role of leadership and direction. In one 
sense, the self-perception description of line 
workers is not necessarily that which might 
have been expected. In another sense, it is. 
If line workers are considered only in regard 
to their hierarchical position in the organiza- 
tion, they must of necessity be followers more 
often than management personnel simply by 
virtue of their position at the bottom of the 
organization. The results indicate that line 
workers perceive themselves in accordance 
with this expectation. On the other hand, if 
line workers are thought of as being members 
of “labor,” then the pattern of self-perception 
traits that emerges is not necessarily that 
which fits an often pictured stereotype of 
“militant labor.” There is little if anything 
in the self-description pattern of line person- 
nel, as that pattern is contrasted with man- 
agement personnel, that shows these workers 
viewing themselves in any way except as rela- 
tively passive, cooperative and nonaggressive 
persons. 

An implication of the foregoing is that line 
workers may be perceiving themselves much 
more in terms of their position in the organi- 
zation, rather than as members of a labor 
union or “workers” group. If this is so, it 
means that the traditional picture of the 
“typical” worker would definitely have to 
take into account his organization role in re- 
lation to management, as well as his labor 
role vis A vis management. The results indi- 
cate, then, that if either management groups 
or union groups think of workers merely in 
terms of people who are out to oppose and 
fight management’s leadership at every point, 
their views are not consistent with how line 
personnel actually look at themselves. 

The results also raise some interesting 
points in connection with the recruitment of 
management personnel from line workers. If 
most of the people who tend to enter the or- 
ganization at the line level and stay there for 
a period of time come to view themselves in 
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terms of characteristics that are closely asso- 
ciated with followership rather than leader- 
ship, would they then function effectively if 
promoted to management supervisory jobs? 
A related question would be whether they 
would be satisfied in those jobs. As was 
pointed out earlier, self-descriptions of the 
type reported here are presumably deter- 
mined by the relatively enduring traits the 
person sees himself possessing and by the 
particular roles he sees himself fulfilling at 
the moment. To the extent that the self-per- 
ceptions represent long-lasting traits, the data 
would suggest that many line workers would 
not be suitable for management type posi- 
tions. To the extent the self-descriptions are 
a function of the particular role of the mo- 
ment, then the data would not provide an- 
swers to the questions. In the latter case, 
the data would imply that line workers clearly 
see their role of the moment, but would not 
necessarily indicate how these workers would 
fit into new leadership roles required in man- 
agement positions. 


Summary 
A 64-pair forced-choice adjective check-list 


was filled out by 463 management personnel 


and 320 line workers. The responses of the 


Lyman W. Porter 


individuals in the two groups were analyzed 
for each of the pairs of adjectives, and it was 
found that 25 pairs differentiated the two 
groups at the .05 level of confidence or better. 
From these differentiating adjectives, inte- 
grated pictures of the self-perceptions of the 
two groups were developed. Management 
personnel tended more often to describe them- 
selves in terms of leadership-type traits, 
whereas line workers relatively more often 
pictured themselves in cooperative-follower 
terms. These findings were discussed as to 
their implications for understanding organiza- 
tional structure and functioning and for la- 
bor-management relations. 
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In an earlier study (1) a marked practice 
effect was demonstrated when the Minnesota 
Clerical Test was administered successively 
at intervals of two to seven days. As pro- 
posed in the report of that study, one pos- 
sible means of overcoming such practice ef- 
fect, or at least to minimize it to tolerable 
levels, may be the use of alternate test forms. 
The purpose of the present study was to test 
this hypothesis with two alternate forms of 
the Minnesota Clerical Test, Form A, now on 
the market, and Form B, not on the market. 


Procedure 


During October, November, and December, 1953, 
and November, December, and January, 1954-55, 
the two forms of the Minnesota Clerical Test were 
given to 575 female applicants for clerical jobs at 
three institutions located in Minneapolis. Included 
were 300 applicants at the University of Minnesota, 
150 at a major industrial plant, and 125 at a na- 
tional life insurance company. 

The two forms of the test were administered to 
the applicants in A B B A order to counterbalance 
any biasing effects from differences in test difficulty, 
fatigue, or other uncontrollable variation. Form A, 
then B, were given to the first applicant; Form B, 
then A, to the second: and so on from applicant to 
applicant. 

The second form was given one minute after com- 
pletion of the first. A longer, more natural time 
span between tests would have been desirable, but 
was quite impractical to arrange. A one-minute in- 
terval, it should be noted, very likely resulted in a 
maximum influence of practice effect. 


Results and Discussion 


Table 1 summarizes the results for com- 
bined groups of applicants. Some practice 
effect did occur, it is apparent, on both the 
Numbers and Names tests. On the Numbers 
test, the difference between means from trial 

1 This research was made possible by a grant-in- 


aid from the Graduate School of the University of 
Minnesota. 


to trial is 7.7 points, significant at the .001 
level. The Names test shows a comparable 
difference in means of 7.8 points, likewise sig- 
nificant at the .001 level. The changes in 
centile ranks, representing practice effect, 
amount to about 10 centile points on em- 
ployed clerical worker norms. 

It is important to note that the practice 
effect for both number checking and name 
checking is less than one-third of the size of 
the standard deviations. Also, it is note- 
worthy that the standard deviations remain 
constant from Trial 1 to Trial 2 for both 
parts of the clerical test. 

Comparing practice effects from repeated 
administrations of alternate forms and identi- 
cal forms—the latter reported in the earlier 
study (1)—we find a smaller gain in score 
with alternate forms, supporting the hypothe- 
sis of the study, ie.; that alternate forms of 
the Minnesota Clerical Test would reduce 
practice effect. 


Table 1 


Practice Effects with Alternate Forms of the Minnesota 
Clerical Test Administered to 575 Applicants 
for Clerical Work 


Name Checking 


Number Checking 


1 2 





120.8 
26.9 
+7.7 
11.3 
* 001 
82 


128.5 
27.5 


Centile rank of 
mean scores 
(employed cleri- 
cal workers) 20 
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Table 2 


Comparison of Practice Effects with Identical and 
Alternate Forms of the Minnesota 
Clerical Test 








Number Checking Name Checking 
1 2 1 2 








Mean Score 


Identical Forms* 
Alternate Forms 


142.7 
119.3 


137.0 
120.8 


Difference Between 
Mean Scores 
Identical Forms 20.9 
Alternate Forms 7.7 


Percentage Increase 
in Mean Scores 
Identical Forms 15.3% 19.6% 
Alternate Forms 6.4 6.5 





® Mean scores for first and second es of identical 
tests to 32 yok psychology students. 


Comparative practice effects from the two 
methods of test administration are shown in 
Table 2. 

Though the two study groups are not 
strictly comparable—one scoring consider- 
ably higher on the Minnesota Clerical Test 
than the other,” there is a strong tendency 
for alternate forms to show less increase in 
mean score than identical forms, on both 
Numbers and Names tests. The increase in 
mean score on the Numbers test is 15.3% 
for identical forms, 6.4% for alternate forms. 
On the Names test, the increase is 19.6% for 
identical forms, 6.5% for alternate tests. 

In the practical terms of employee selec- 
tion, the amount of practice effect found with 
alternate forms of the Minnesota Clerical Test 
seems, to the authors, to be tolerable. With 
increased time intervals between repeated 
testings, the practice effect probably would 
be even less than that found here, when the 
tests were administered only a minute apart. 

The immediate practical implication would 
, seem to be publication of additional forms of 


2 Subjects in the first test were undergraduate and 
graduate students and extension students, groups 
which tend to score high on the Minnesota Clerical 
Test. Subjects in the second test were day-to-day 
applicants for clerical positions. 


the Minnesota Clerical Test to combat the 
practice-inflated test scores of itinerant ap- 
plicants in everyday personnel situations. 
Development of additional forms of the Min- 
nesota Clerical Test would be complicated, 
however, by what strongly appears to be a 
substantial difference in form difficulty. 

Quite unexpectedly, Form B proved to be 
more difficult (score lower) than Form A, 
particularly in Numbers: 


Names: 
Numbers: 


B <A by 2.5 points 
B <A by 5.9 points 


The differences in form difficulty and tests 
of significance are summarized in Table 3, 
based on a special analysis of 293 subjects 
from the total sample. Scores for each test 
in each form were combined irrespective of 
test order, justified by an analysis showing 
difficulty effect to be independent of order. 
For example, scores for the Names test of 
subjects taking Form A before Form B and 
those taking Form B before Form A were 
combined into a single set of scores. 

Compared with mean practice effects, it is 
apparent that the difference in difficulty for 
Numbers (5.9) is nearly equal to the differ- 
ence in practice effect (7.7). Unaccountably 
and somewhat contrary to reasonable expec- 
tation, Names show a smaller difference in 
form difficulty (2.5) than Numbers (5.9). 

Rationally, one would assume that the sim- 
ple subject matter of the Numbers test would 
inevitably yield comparable levels of difficulty. 
Yet the difference, sizeable and statistically 
significant, is there, explained by no immedi- 


Table 3 


Differences in Form Difficulty on the 
Minnesota Clerical Test 








Numbers 


Names 





Form Form 
A B 


F orm 
B 





293 293 293 
123.0 120.9 
2.6 
2.5 
.96 
30 
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ate reason other than the generalization that 
certain patterns of names and numbers in one 
form must be inherently more difficult than 
in the other. Which is to say that resolving 
the enigma of form inequality is a subject 
for further study in itself. 

Before alternate forms of the Minnesota 
Clerical Test could be put on the market, 
which is the practical hypothesis underlying 
this project, the disparity in difficulty level of 
the Numbers test would have to be resolved. 
Ignoring test difficulty and developing addi- 
tional test forms would only confound prac- 
tice and difficulty effects hopelessly and ren- 
der comparison of test scores with a single 
norm impossible. Empirically establishing 
difficulty levels and recalibrating tests would 
involve the formidable problems of testing 
large samples and equalizing test difficulty. 

One practical first step would seem to be a 
basic study of test content, particularly the 
Numbers test. Content analysis of number 
frequency and number patterns might reveal 
significant differences between forms account- 
ing for the variation in difficulty. Or, num- 
ber patterns and number frequencies could be 
varied experimentally to isolate the basic diffi- 
culty variables. Although this study was 
planned in anticipation of a practical recom- 
mendation, there seems to be no alternative 
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now but to recommend a basic course of 
action before any development of additional 
forms of the Minnesota Clerical Test. 


Summary 


1. Repeated administrations of alternate 
forms of the Minnesota Clerical Test show 
less practice effect than repeated tests with 
identical forms. 

2. The degree of practice effect is a 6.4% 
increase in mean score on the Numbers test, 
6.5% on the Names test, when Forms A and 
B are administered in ABBA order. 

3. The immediate development of alternate 
test forms of the Minnesota Clerical Test is 
contra-recommended in view of the substan- 
tial differences in test difficulty on the Num- 
bers test. 

4. Any attempt to develop alternate forms 
of the Minnesota Clerical Test for the Num- 
bers test should be preceded by a basic study 
of test-content factors underlying inequalities 
in form difficulty. 


Received May 31, 1957. 
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One of the unanswered questions in code 
training concerns the best procedure for help- 
ing students eliminate their individual errors 
in receiving. It is well known that learning 
to receive some Morse Code characters is 
more difficult than learning to receive others, 
and that students have a strong tendency to 
confuse certain characters with certain other 
characters while some characters are seldom 
confused. It seems likely that a meaningful 
categorization of such errors might serve as 
a basis for improving training procedures in 
this area. 

There have been a number of previous at- 
tempts to classify such code error patterns. 
These categorizations have been largely sub- 
jective and involve certain untested prior as- 
sumptions. Seashore and Kurtz (3), for ex- 
ample, suggest the following classification: 

1. Errors involving the shortening of the 
signal. 

2. Errors involving the lengthening of the 
signal. 

3. Errors characterized by complete sub- 
stitution of dots for dashes and dashes for 
dots. 

4. Errors characterized by alteration of the 
elements within the signal. 

5. Miscellaneous. 

On the other hand, War Department Tech- 
nical Manual TM 11-459, “International 
Morse Code (Instructions)” (6) suggests the 
existence of two types of errors, “dotting 
errors” and “copying too close.” The first 

1This research was carried out while the writers 
were with the Air Force Personnel and Training Re- 
search Center. The work was done under ARDC 
Project No. 7706 in support of the research and de- 
velopment program of the Air Force Personnel and 
Training Research Center, Lackland Air Force Base, 
Texas. Permission is granted for reproduction, trans- 


lation, publication, use and disposal in whole and in 
part by or for the U. S. Government. 


of these is said to occur at high-speed receiv- 
ing levels and consists of characters which 
differ from each other only in the number of 
dots contained in the signal. The error of 
“copying too close” is said to occur when a 
student starts writing a response before he 
has heard all of the signal. The effect of this 
is to perceive a signal shorter than the signal 
sent. This corresponds to the first category 
suggested by Seashore and Kurtz. 

The purpose of the present study was to 
organize code errors made by radio operator 
trainees into meaningful categories derived em- 
pirically from code error data. Specifically, 
the study: (a) determined the order of diffi- 
culty of the most frequent substitution errors, 
(6) applied the techniques of factor analysis 
to the inter-correlations among these errors, 
and (c) examined the relationship between 
the error factors obtained and the most fre- 
quently made errors. The primary objective, 
then, was to group the most frequent errors 
into a small number of independent cate- 
gories. The inference is that such categories 
correspond in some way to underlying sources 
of difficulty contributing to a variety of code 
substitution errors. 


Method 
Subjects 


The Ss in this study were Air Force radio opera- 
tor trainees. These students had passed their code 
checks for receiving code at a speed of six groups 
per minute and were learning to receive code at a 
speed of eight groups per minute.” 


Data Collection Procedure 


The data collected were “substitution errors” in 
which the student’s response to an auditory code 
signal consisted of writing down a number or letter 


2 The speed per minute was computed on the basis 
of groups consisting of five Morse Code characters 
(letters or numbers) sent in a randomized order. 
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Table 1 


Ranking of Code Errors 





(B) 


(S) +-- 


(H) 


(J) - 


(6) 


(1) - 

(V) ... 
(4) «+>. 
(H)---- 
(W)--- 


Proportion 
Number of of Substi- 
Response Error Substitutions tutions* 


“aie 095 
075 
tees 059 
.058 
053 
050 
.035 
027 
025 
.022 
021 


(Z) ---- 
(Z) - 


(Y) 


(5) 
(2) 


(M) 
(8) 


(d) 


(7) 
(G) 
(7) 
(8) 


(T) 


(E) - 
(H):--- 


(C) -. 


(4) -+-- 
(3) --- 
(U) -- 


(J) - 
(8) - 


(X)--- 


(V)--- 


207 
192 
211 
193 
189 
186 
185 
169 
150 

78 
147 
136 
136 
135 
137 
134 
132 
126 


Total: 11,709 


oportions have been rounded from five places. Each stimulus signal except the signal - was sent between 13,000 


a 
and 15, times. The signal ~----- was sent 7645 times. 


different from the number or letter which was actu- 
ally sent. The student might, for example, write 
the letter “S” when the letter “H” was sent (in this 
case the student perceives ... instead of ....). 

An “error pair” consists of the signal sent, to- 
gether with the erroneous response to this signal. 
In the above example, the “error pair” is H-S. Since 
the signals for 26 letters of the alphabet and the 
numbers 0-9 are sent, there are 36 X 36—36 or 
1,260 possible substitution errors. Of course, many 
of these (e.g., substituting --- for .) are highly un- 
likely to occur. 

Data from a total of 807 radio operator students 


were collected, but this sample was reduced consid- 
erably. First, all code checks which fell below a 
level of 80% accuracy were discarded. Checks 
which fell below this level of accuracy typically in- 
cluded many omissions and instances where the stu- 
dent had lost his place in the code series. The 80% 
level was chosen in part because of its agreement 
with the procedure used by Seashore and Kurtz (3). 
Second, student records which included fewer than 
1.5 code checks per day were discarded.* The com- 


Students were dropped from the study if, after 
15 school days, they had failed to qualify at the 
eight groups-per-minute speed. 
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bination of these two criteria reduced the sample of 
students records used in the present analysis to 299. 


Tabulations and Analysis 


The student error records were tabulated in the 
following way: For each of the 299 students in our 
sample, a 36 X 36 table was prepared. The 36 rows 
of this table corresponded to the 36 Morse Code 
stimulus signals sent. The 36 columns of the table 
corresponded to the 36 possible erroneous reception 
responses which a student might make to each 
stimulus signal. Substitution errors were identified 
by collating the student’s code check results with a 


Richard W. Highland and Edwin A. Fleishman 


master key. The cells of the table permitted tabula- 
tion of the substitution error frequencies. 

Since the students in the sample had not all been 
exposed to the same code checks, the raw frequencies 
of the stimulus-response errors were converted into 
proportions. Thus, a student’s “score” for a given 
error was the proportion of times he had made that 
error. 

Table 1 lists, in order, the 34 most frequent sub- 
stitution errors obtained in the present study. Actual 
code signals sent and the code signal perceived in 
each instance are presented along with the propor- 
tion of times each error was made. These 34 pairs 


Table 2 


Final Rotated Matrix of Error Factors in Code Reception* 








Stimulus and 
Response 

(letters and 
numbers) 


Code Pattern 





Stimulus Response 


Z 
° 





6-B 
H-S 
‘5-H 
H-5 


1 
2 
3 
4 
5 
6 
7 
8 
9 


V-H 


Factors 


IV 








* Rounded from three places and decimals omitted. 
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represent about half the total substitution errors 
made out of the 1,260 possible pairs. 

The 299 “scores” for each of these 34 error pairs 
were correlated with the scores for each of the other 
error pairs. The resulting correlation matrix was 
factor analyzed using the Thurstone Centroid Method 
(5). The rotations necessary to achieve orthogonal 
simple structure were accomplished using Zimmer- 
man’s procedure (7). 


; Results 
Order of Difficulty of Error Pairs 


The extent to which the order of difficulty 
in Table 1 agrees with the results obtained by 
Seashore and Kurtz (3)* is indicated by the 
fact that 23 of the 34 variables used in the 
present study were among the 34 most fre- 
quent substitution errors identified by Sea- 
shore and Kurtz. 

Table 1 indicates that the most difficult 
error pair was 6—B (that is, -.... was per- 
ceived as -.--). This error was made an 
average of 9.5 times for each 100 times 
that the stimulus (-..-.-.) signal was trans- 
mitted. It should be noted that the total 
number of errors committed for a given 
stimulus signal was considerably larger than 
is indicated in Table 1. These figures only 
represent how often a particular substitution 
error was made; they do not include other 
substitution errors for the same stimulus sig- 
nal (beyond the 34th most frequent) and 
they do not include omissions. 


Interpretation of Factors 


The final rotated matrix is presented in 
Table 2.5 Factor loadings of .30 and above 


4 Since Seashore and Kurtz presented data for only 
the three most frequently occurring substitution 
errors for each code character, the comparison is 
suggestive rather than exact. 

5 Two tables showing the correlation matrix and 
the centroid matrix have been deposited with the 
American Documentation Institute. Order Docu- 
ment No. 5450 from the ADI Auxiliary Publica- 
tions Project, Photoduplication Service, Library of 
Congress, Washington 25, D. C., remitting in ad- 
vance $1.25 for microfilm or $1.25 for photocopies. 
Make checks payable to Chief, Photoduplication 
Service, Library of Congress. After completion of 
the analysis, it was discovered that an error had oc- 
curred in the original tabulation of Variable 5. It 
will be noted that, as a result, the communality and 
first centroid loading for this variable are entirely 
out of line. Rather than repeat the entire analysis, 
it was decided to eliminate Variable 5 from the fac- 
tor interpretations, except in the case of the factor 
on which it was most heavily loaded. 
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were considered significant 
factor. 

Factor I, Dash Estimation. This factor is 
confined to those signals which contain a 
number of dashes. In all of these signals, 
the dashes occur in a series either at the be- 
ginning or end of the signal or comprise the 
entire signal. In no case is there a dot inter- 
spersed within the series of dashes, either in 
the way the signal is sent or perceived. The 
error is always in estimating the correct num- 
ber of dashes in the signal. This error may 
be one of omission or addition; that is, the S 
may perceive one too many or one too few 
dashes, but he never “shortens” a dash to a 
dot or “lengthens” a dot to a dash. The 
number of dots in these signals is always per- 
ceived correctly. This factor is extremely 
well defined, as loadings drop off sharply 
after .42. Of the 34 error pairs included in 
the analysis, the six variables on this factor 
represent, without exception, the only errors 
of this type in the analysis. 


in defining a 


Code Pattern 


Frequency Error Response Factor 
Rank Pair Stimulus Error Loading 


11 j-w aon Sie 60 
5 1-J hens SI 
19 0-M , , 4S 
7 j-1 ima 44 
13 8-Z eae 43 
30 Z-8 es 42 


Factor II, Dot Estimation. This factor in- 
volves signals consisting mainly of dots. The 
dots always come in a row either at the be- 
ginning or end of the signal or else the sig- 
nal consists only of dots. No dash ever sepa- 
rates the dot sequences either in the signal 
actually sent or in the signal as perceived. 
This appears to be the dot counterpart of 
Factor I. In this factor the error is always 
in estimating the correct mumber of dots in a 
series. No errors are made in estimating the 
number of dashes. The factor includes both 
overestimates and underestimates of the num- 
ber of dots in a series; however, it appears 
most strongly in errors of underestimation 
with signals containing a long series of dots. 
Apparently the factor does not extend to er- 
rors in which only two dots are sent and per- 
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ceived as one, as in Variable 33 (.. sent, . 
received) or Variable 28 (--.. sent, --. re- 
ceived). This factor is general to more vari- 
ables than any other and contributes to more 
of the most frequently occurring errors. Thus, 
the four most frequent errors in the sample 
are saturated with this factor. 


Code Pattern 
Frequency Error Response Factor 
Rank Pair Stimulus Error Loading 
2 H-S ven ves 55 
3 5-H bee 50 
1 6-B eo 28 44 
12 7-Z <ntee -aee A3 
32 S-I tee . 40 
4 H-5 sees 38 
g 4-V Denon pad 37 
24 B-D ates ~+ 35 
9 V-4 re Go bP 34 
10 S-H see tase 34 
29 Z-7 --0s -aeee 33 


Factor III, End-element Substitution. The 
stimulus characters loaded on this ‘factor are 
of varied types (i.e., predominantly: dots, pre- 
dominantly dashes and mixed elements), but 
the type of error made is completely consist- 
ent from character to character. In each 
case, an error of substitution is made and this 
error always occurs on the last element of the 
character sent. The only other variable rep- 
resenting an error of this type is Variable 16 
(....- sent, received). Variable 16 
has a loading on this factor of .25, next 
highest to the variables listed. Among the 
34 variables in the analysis, none but these 
involves final element substitution. The sub- 
stitution may be either a dot for a dash or 
a dash for a dot. The trainee, in these in- 
stances, always perceived the correct mumbcr 
of elements per character. 


Code Pattern 


Error 
Pair 
V-H 
5-4 
B-X 
Y-C 
H-V 
C-Y 


Response Factor 
Error Loading 


Frequency 
Rank 


Stimulus 


Richard W. Highland and Edwin A. Fleishman 


Factor IV, Internal Error. This factor ex- 
tends to fewer variables than was the case 
with the previous factors, and interpretation, 
therefore, is not as secure. All the charac- 
ters sent consist of both dots and dashes. 
These occur as two series, dots followed by 
dashes or dashes followed by dots. The char- 
acters do not involve changes from dots to 
dashes and back to dots again or changes 
from dashes to dots and back to dashes again. 
The distinguishing features of three out of 
four of these variables is that an internal sub- 
stitution error is made. These three vari- 
ables involve the sending of five-element char- 
acters; the substitution error occurs precisely 
in the middle element. Further, the error oc- 
curs at the end of the initial dot or dash se- 
ries within the signal. It is to be noted, how- 
ever, that this is the only type of internal 
substitution error made among the 34 vari- 
ables in the analysis. It is not known, for 
example, if this factor would extend to a sig- 
nal in which ..-. is sent and .--. is re- 
ceived. 


Code Pattern 


Response 


Frequency Error 
Error 


Rank Pair 


Factor 
Loading 
22 thee 45 
23 ¢* oe > ea 41 
27 Al 


17 3 tt-- 30 


Stimulus 


Factor V, A Doublet. This is a doublet fac- 
tor consisting of one variable (V—4) in which 
..= Was sent and ....- was received and one 
variable (4—V) in which ....- was sent and 
-»= Was received. It can be seen that the 
second of these variables is the inversion of 
the first. Apparently this factor represents 
nothing more than some specific source of 
difficulty restricted to these two signals. It 
is surprising that more doublets of this type 
(that is, those confined to reversals) did not 
appear in the analysis since 11 of the 34 sub- 
stitution errors studied are reversals of 11 
other errors among these 34. Instead, both 
variables constituting an inverted pair usu- 
ally received their highest loadings on the 
same factor, along with other error pairs. 
Apparently, the major source of difficulty 
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underlying such inversion errors is general to 
other kinds of errors. 

Factors VI and VII. These are considered 
to be residual factors with no apparent psy- 
chological meaning. 


Discussion 


It should be pointed out that the four fac- 
tors identified are not to be considered the 
only factors in code error patterns. The pres- 
ent study was confined to the 34 most fre- 
quently made errors. For example, it is pos- 
sible that initial-element substitution errors 
may cluster together; this would be the coun- 
terpart to our end-element substitution. How- 
ever, initial-element substitutions did not oc- 
cur with sufficient frequency to be included 
in the present study. It is reasonable to as- 
sume that the four factors in the present 
study are of the most practical importance 
since they were identified from the most fre- 
quently made errors. 

Seashore and Kurtz (3) list the number of 
times that the three most frequent substitu- 
tion errors for each of the 36 Morse Code 
characters were made during the second week 
of code training. Since this provides fre- 
quency information not available elsewhere 
on 108 common substitution errors, an at- 
tempt has been made to classify these errors 
in terms of the four factors found in the 
present study. The classification was ac- 
complished subjectively or on the basis of a 
factor loading from the present study where 
available. The results of this classification 
are presented in Table 3. It can be seen that 
not all of the 108 substitution errors were 
readily identifiable with our four factors. For 
this reason, a category of “All Other Errors” 
has been included. 

When classified in this way, it is clear that 
the factors of Dot Estimation and End-ele- 
ment Substitution account for the largest per- 
centage of errors in the Seashore-Kurtz data 
as well as in our own. Although these two 
categories contribute less than half of the 
108 stimulus-response pairs included in the 
Seashore-Kurtz list, they account for two- 
thirds (66.3%) of the total number of sub- 
stitution errors made; thus, of the 9502 er- 
rors made for the 108 S—-R pairs, 6293 of 


Table 3 


Peegunty « of Several Kinds of Substitution Errors* 








Number of Fre- Per Cent 
Erroneous quency of Total 
Category S-R Pairs of Errors Frequency 
Factor I 
Dash Estimation 11 801 8.4 
Factor IT 
Dot Estimation 18 3,435 36.2 
Factor IIT 
End-element Sub- 
stitution Error 26 2,858 30.1 
Factor IV 
Internal Error 14 1,057 11.1 
All Other Errors 39 1,351 14.2 


Total 100 


*The 108 substitution errors identified by Sedifinies and 
Rares (3) classified according to factors identified in the present 
study. 





these were errors of Dot Estimation or End- 
element substitution. The remaining two 
categories derived from our factor analysis, 
Internal Error and Dash Estimation, con- 
tribute a combined total of 19.5% of the to- 
tal errors. Thus, our four factors can be 
used to account for 85.8% of the errors in 
the Seashore-Kurtz list.* 

It is of special interest to note that the 
factors identified do not agree with the error 
categories arrived at subjectively by other 
investigators (see above). However, as we 
have shown, these factors derived empirically 
from the error intercorrelations do simplify 
the description of the same phenomena in 
terms of fewer orthogonal categories. A good 
example is the prior categorization of errors 
into those of “shortening” the stimulus char- 
acter (e.g., .... sent and ... perceived), and 
those of lengthening (e.g., ... sent, .... per- 
ceived). Our results show that errors of 
lengthening and errors of shortening may fall 
within either the Dot Estimation or Dash 
Estimation factors. The critical feature con- 
tributing to difficulty is not “lengthening” or 


If we classify subjectively the remaining 14.2% 
of the errors, we find that 7.2% of = _Tepresent 
a dot-dash inversion (eg. -++++ sent; +---- 
received) and 2.9% _ initial-element “scbietiane. 
Whether these would eventually turn out to be em- 
pirical factor categories is not known. 
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“shortening” but rather incorrect perception 
of the number of dots or dashes in series. 
This is related to the finding, pointed out 
earlier, that inverted error pairs appear within 
the same factor. In this regard, both our 
findings and those of Seashore and Kurtz in- 
dicate that inversion errors of “shortening” 
occur more frequently than errors of “length- 
ening” involving the same pair; for example, 
sent and . --- received occurs more 
frequently than .--- sent and 
ceived. This kind of result taken together 
with the finding that the same factor con- 
tributes to both errors, indicates that re- 
sponses of “lengthening” and “shortening” 
do not differ in kind, but do differ in their 
probability of occurrence. The response tend- 
ency for “shortening” is ordinarily greater 
than the response tendency for “lengthening.”’ 
It should be stressed that we have made 
no attempt to relate our “factors” to under- 
lying perceptual processes. These factor cate- 
gories represent descriptive labels based on 
the physical descriptions of the errors made. 
Relationships to more fundamental processes 
may be of theoretical interest and worthy of 
further investigation. A step in this direction 


has been taken in a recent study by Fleish- 
man, Roberts and Friedman (2). 


Implications for Training Methods 


In Air Force training, radio operator stu- 
dents are presented each of the 36 Morse 
Code characters with approximately equal 
frequency. Seashore and Kurtz (3) recom- 
mended that training emphasize those char- 
acters most frequently confused with each 
other." The present results suggest the ad- 
ditional step of diagnosing students’ difficul- 
ties in terms of the four underlying “sources” 
identified and providing remedial training us- 
ing materials which emphasize these. Since 
the factors are orthogonal, diagnosis for in- 
dividual students may show difficulties char- 
acteristic of only one factor, or of combina- 
tions of factors. It is possible that one source 
of difficulty may yield to remedial training 
while another may not. The present cate- 

7 At least one experiment (4) has shown no ad- 


vantage ‘of this approach in paired associate (Code 
signal-letter) code learning. 


Richard W. Highland and Edwin A. Fleishman 


gorization provides a basis for experimental 
research on this problem. 


Implications for Selection 


In the Air Force, and in other military 
branches, students are selected for training on 
the basis of testing procedures which include 
at least one aural code aptitude test. The 
most valid single code test has been found to 
be one which requires the learning of actual 
Morse Code signals, although over-all predic- 
tion can be increased by adding to this test 
other aural tests not involving actual Code 
(1). It is possible that the validity of code 
aptitude tests can be increased still further if 
such a test were constructed around the four 
factors identified in the present study. At 
least it would seem reasonable to include a 
representation of items sampling these fac- 
tors. Our findings also present an interest- 
ing hypothesis worthy of experimental test. 
Since two of the factors, Dot Estimation and 
End-element Substitution, contribute most of 
the errors that students made in training, it 
is possible that a code aptitude test empha- 
sizing these factors might prove particularly 
predictive of especially high levels of code 
proficiency. In other terms, such measures 
may be more predictive of final asymptotes 
in code learning than present measures. 


Implications for Proficiency Measurement 


Implications here are analogous to those 
presented above. A current procedure for 
evaluating a trainee’s level is the “code 
check” during which the trainee receives a 
random series of signal “groups” (see above) 
at a given code speed. It is likely that these 
are quite comparable from student to stu- 
dent, or class to class, etc. However, it is 
possible that consideration might be given to 
reapportioning the relative frequency of sig- 
nals sent within the above factor categories. 


Summary 


A factor analysis of correlations among the 
most frequent substitution errors in receiving 
International Morse Code was performed. 
Four principal factors were identified: Dash 
Estimation, Dot Estimation, End-element 
Substitution, and Internal Error. These re- 
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sults differ somewhat from previous cate- 
gories arrived at subjectively. The Dot Esti- 
mation factor was general to more different 
types of substitution errors and contributed 
to the most frequently occurring errors. 


Received June 24, 1957. 
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A Change in a Product Image 


William D. Wells, Fedele J. Goi, and Stuart Seader 
Rutgers University, Newark Colleges 


Advertising campaigns are often designed 
to make a specific change in the reputation 
of a product or a brand. Tea advertising, for 
example, is currently stressing the idea that 
tea is “brisk,” “robust,” and “hearty,” in an 
effort to counteract the impression that tea is 
a namby-pamby drink. Parliament Cigarette 
advertising is another example. For years, 
Parliaments were characterized by a premium 
price, an odd package, and advertising which 
leaned heavily on snobbery. Now, Parlia- 
ments have a “new low price,” a standard 
package, and advertising designed to show 
that Parliaments may be smoked safely on 
either side of the tracks. 

The objective of this kind of advertising 
is to change the product “image” or product 
“personality” in such a way that the product 
itself will have wider appeal. It is an impor- 
tant objective because product images are 
known to have a strong influence on sales. 

An earlier report (2) presented an Adijec- 
tive Check List designed for survey use in 
the study of product images. It also pre- 
sented, as specimen results, the images asso- 
ciated with Cadillac, Buick, Chevrolet, Ford, 
and Plymouth automobiles among a group 
of male college students. The earlier report 
was based on data gathered during October, 
1956, just before the introduction of the 1957 
models. The present report is based on data 
gathered from approximately the same group 
of respondents six months later. The con- 
trast between the two sets of results provides 
an illustration of the impact of the new mod- 
els and their attendant publicity upon the 
1956 brand images. 

The earlier report contained a detailed dis- 
cussion of the Adjective Check List itself, and 
of the rationale behind the forced choice de- 
sign of the questionnaire. Therefore, pro- 
cedure for the present report will be sketched 
only briefly here. 


Procedure 


The Adjective Check List used in both studies con- 
tained 108 trait names, selected for frequency of use 


and appropriateness to the problem of measuring 
product images. Respondents were asked to indicate 
first whether each trait was most typical of Cadillac 
Owners, Buick Owners, or Chevrolet Owners; then 
whether each trait was most typical of Chevrolet 
Owners, Ford Owners, or Plymouth Owners. Trait 
names and car names were systematically rearranged 
to avoid position biases. 

The respondents in the 1956 survey were given no 
special instructions about model vear, so it is im- 
possible to be absolutely sure of the frame of refer- 
ence they used in answering. However, it seems 
fairly safe to assume that most of them answered in 
terms of the general run of cars then on the road. 
In the 1957 survey, the frame of reference was speci- 
fied. Each questionnaire was plainly marked “1957 
models.” In both 1956 and 1957, respondents were 
100 fraternity members from Rutgers University, 
Newark Colleges. The overlap between the two 
groups was about 70%. 


Results and Discussion 


It is obvious that results obtained from 100 
college students can not be thought of as 
characteristic of the consumer population. 
However, the changes which occurred within 
this limited population provide a “test tube” 
demonstration that the introduction of a new 
model car, reinforced by a heavy advertising 
investment, can produce a marked attitude 
change in a relatively short time. 

Space limitations prevent reproduction of 
complete tables of results,’ but lists of the 
traits most associated with the various car 
owners will show the nature of the brand im- 
ages. In the lists which follow, trait names 
are ordered by frequency of mention. All 
frequencies exceed change expectations at the 
.O1 level or beyond. 

Cadillac-Buick-Chevrolet. The 1956 data 

1A complete tabulation of the responses to each 
adjective has been deposited with the American 
Documentation Institute. Order Document No. 5504 
from the ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress, Washing- 
ton 25, D. C., remitting in advance $1.25 for micro- 
film or $1.25 for photocopies. Make checks payable 
to Chief, Photoduplication Service, Library of Con- 
gress. A mimeographed copy of the table may be 
obtained free of charge by writing to William D. 


Wells, Rutgers University, Newark Colleges, Newark 
2, New Jersey. 
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showed the Cadillac-Buick-Chevrolet compari- 
son to be clearly stratified along class lines. 
The Cadillac Owner was called rich, high- 
class, famous, important, fancy, proud, su- 
perior, and successful. The Buick Owner was 
called middle-class, brave, masculine, strong, 
modern, and pleasant. And the Chevrolet 
Owner was called poor, low-class, ordinary, 
plain, simple, practical, common, average, 
cheap, thin, little, friendly, and small. With 
a variable as powerful as price differential 
operating, and with no major changes in 
either the automobiles or their advertising, it 
is not surprising to find that the 1957 data 
show exactly the same trend. In 1957, The 
Cadillac Owner was still called rich, high- 
class, famous, important, etc.; The Buick 
Owner was called middle-class, masculine, 
rough, calm, and independent; and The 
Chevrolet Owner was called plain, average, 
poor, simple, little, practical, low-class, etc. 
A few of the differences between the 1956 
data and the 1957 data were statistically sig- 
nificant, but the differences were minor in 
size and import compared with the differences 
which occurred within the “low priced three.” 

Ford-Plymouth-Chevrolet. In the 1956 


data, The Ford Owner appeared youthful and 
dashing. The traits most often ascribed to 
him were: masculine, young, powerful, good- 


looking, rough, dangerous, strong, single, 
merry, loud, active, cool, tall, interesting, 
sharp, and popular. In 1957, the traits most 
often ascribed to The Ford Owner were: dan- 
gerous, loud, rough, powerful, cross, thin, ac- 
tive, proud, and brave. This image seems 
rougher, tougher, and less debonair. 

In 1956, the image of The Plymouth 
Owner was not unfavorable, but it was defi- 
nitely stodgy. The Plymouth Owner was 
called quiet, careful, slow, silent, moral, fat, 
gentle, calm, sad, thinking, patient, honest, 
understanding, and content. Against this im- 
age, the Chrysler Corporation laid a radically 
changed automobile and high-style advertis- 
ing campaign which announced that the new 
Plymouth was not one, but three full years 
ahead. “In one flaming moment,” introduc- 
tory copy said, “Plymouth leaps three full 
years ahead—the only car that dares to break 
the time barrier! 


Plymouth’s traditionally | 


121 


great engineering brings you the fabulous 
new Fury ‘301’ V-8 engine . . . revolution- 
ary new Torsion-Aire ride . . . exhilerating 
sports car handling ... dramatic Flight- 
Sweep Styling. The car you might have ex- 
pected in 1960 is at your dealer’s now/” (1, 
pp. 18-19). It is hard to believe that the 
quiet, careful, slow owner of an old Plymouth 


would dare go near a car with a Fury engine. 


The net effect of the new model and the 
high-style advertising was to shatter the old 
Plymouth image completely—at least as far 
as the present respondents were concerned. 
Of the 14 significant adjectives in the old 
image, not one remained in 1957. The 1960 
Plymouth image consisted of six words: high- 
class, feminine, important, rich, different, and 
particular. 

In the 1956 data, the Chevrolet portion of 
the Ford-Plymouth-Chevrolet comparison was 
somewhat nondescript. Ordinary, fair, and 
common were the only adjectives significantly 
above chance at the .01 level. In 1957, the 
product image was still something less than 
exciting. The Chevrolet Owner was called 
small, low-class, little, simple, ordinary, and 
practical. It is worth noting that 1957 
Chevrolet advertising persistently billed the 
new Chevrolet as “sweet, smooth, and sassy.” 
Without the backing of a dramatically 
changed automobile, this advertising seems to 
have made little impression upon the mun- 
dane practicality of the good old Chevrolet. 


Summary 


A previous report described the personality 
stereotypes associated with five well-known 
automobiles by a group of college student 
respondents. The present report shows the 
changes in these stereotypes which resulted 
from the introduction and promotion of the 
1957 models. 


Received June 28, 1957. 
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A Study of Occupational Stereotypes ' 


K.. F. Walker 


University of Western Australia 


Although the concept of stereotype has 
been fruitfully applied in social psychology, 
particularly in studies of public opinion and 
political and international attitudes, occupa- 
tional stereotypes have received little atten- 
tion. So far their empirical investigation has 
been confined to a few studies of stereotypes 
relevant to industrial relations. 

Haire and Grunes (4) adopted Asch’s tech- 
nique (1) of studying the way in which the 
perception of a single personality variable in- 
fluences the perception of other aspects of the 
personality, and found that perception of a 
person in the role of a factory worker modi- 
fied various other aspects of the observer’s 
view of the person. Haire (3) asked labor 
union and managerial personnel to check the 
adjectives they considered applicable to men 
shown in photographs with accompanying 
personality descriptions. He found marked 
differences in their responses according to 
whether the men were labeled as union offi- 
cials or plant managers. In another investi- 
gation reported in the same paper he found 
characteristic differences in the types of 
words and phrases used by labor and man- 
agement representatives in collective bargain- 
ing conferences. An earlier unpublished in- 
vestigation by Haire and Morrison (5) 
showed differences in the perception of labor 
and management personnel by children of 
higher and lower income groups. 

Stagner (8) had college students check 
adjectives they considered applicable to the 
average business executive and the average 
worker. They were also asked to check the 
adjectives they considered “good” and those 
they thought applied to themselves. Although 
many traits were ascribed to executives and 
workers in approximately equal frequency, 
the students saw definite differences between 
the two groups on a number of traits (rank 


1 This study was supported by a grant from the 
Carnegie Corporation of New York which, however, 
is in no way responsible for any statement made. 
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difference correlation + .44). Pro-labor and 
anti-labor students differed in their percep- 
tions of the traits of both groups, the differ- 
ence being greater in the traits ascribed to 
executives. Pro-labor students attributed to 
themselves most of the traits they ascribed to 
workers, and regarded these traits as good, 
but they rejected the traits they ascribed to 
executives and did not regard these traits as 
good. Anti-labor students ascribed their own 
traits to executives much more than to work- 
ers. There was some evidence that the stereo- 
type of the executive was stronger than that 
of the worker, especially in the anti-labor 
students. 


Method 


The aim of the present study was to see whether 
the technique used by Katz and Braly (6) and Gil- 
bert (2) to investigate ethnic stereotypes would yield 
comparable occupational stereotypes that would be 
relevant to industrial relations. The technique of 
Katz and Braly required subjects to choose from a 
list of 84 adjectives the five which in their opinion 
best described members of a particular ethnic group. 
In the present investigation, 10 occupational groups 
were substituted for ethnic groups and the list of 
adjectives was somewhat altered and extended. The 
sample group consisted of 68 male and 56 female 
university students enrolled for an _ introductory 
course in Psychology. The age range was 17 years 
to 46 years in the male group, with a median of 
20.4 years and a mode of 18 years; in the female 
group the age range was 17 to 30 years, with a 
median of 17.5 years and a mode of 18 years. The 
students were told that the project was a research 
and completed the schedules in class time. In ad- 
dition to naming the adjectives, they were asked to 
rate their political sympathies on a five-point scale 
running from strongly pro-Labour to strongly pro- 
Liberal or Country Party (Labour’s political op- 
ponents). They were also asked to rank the 10 
occupational groups in order of their preference if 
circumstances permitted them to choose any of 
them for their life work. 

The strength of stereotypes was measured by the 
index used by Katz and Braly, which is given by 
the number of adjectives sufficient to account for 
half the total votes cast. Thus if N = 20, and all 
subjects list the same five adjectives, each of these 


would account for 20 votes. The total number of - 
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votes is N X 5= 100, and 2.5 adjectives would ac- 
count for 50% of the total votes. Complete agree- 
ment among voters gives a minimum value of 2.5 
for the stereotypy index, whatever the number of 
subjects or adjectives, but the theoretical maximum 
for the index equals half the number of adjectives 
in the list. Thus in the example above, if there were 
10 adjectives to choose from and they were chosen 
completely at random, the chance expectation would 
be that each adjective would be chosen (20 X 5)/10 
= 10 times. Five adjectives would then be required 
to account for half the total votes. As Katz and 
Braly used a list of 84 adjectives the stereotypy 
index in their investigation could have gone as high 
as 42. In the present investigation 112 adjectives 
were used, and the theoretical maximum for the 
stereotypy index was 56. 


Results 


Table 1 shows stereotypy indices for the 
ten occupations, and the rank order of pref- 
erence in which the occupations were placed. 
There were no significant differences between 
the sexes, nor between sympathizers with dif- 
ferent political parties. Students expecting to 
enter a particular occupation did not give 
significantly different responses from those 
expecting to enter other occupations. The 
rank difference correlation between the de- 
gree of stereotypy and order of preference 
for the occupations was + 0.79. Table 1 
also shows the main content of the stereo- 
types, in which many differences are evident. 

Comparing these results with those ob- 
tained in studies of ethnic stereotypes among 
students, it would appear that occupational 
stereotypes are approximately as strong as 
ethnic stereotypes. Katz and Braly (6), 
working with Princeton undergraduates in 
1933, obtained an average stereotypy index 
of 8.5 for ten ethnic groups. The replication 
of their study by Gilbert (2) in 1950 ob- 
tained a much higher average index of 15.3. 
In unpublished studies directed by Taft at 
the University of Western Australia in 1953 
and 1954, students gave stereotypy indices 
of 12.3 and 12.6. 

Katz and Braly found no relation between 
respondents’ preference for an ethnic group 
and the strength of the stereotype for that 
group. In the studies at the University of 
Western Australia rank difference correlations 
of + .79 and + .78 were found, which corre- 
spond exactly with that found between the 


Table 1 


Stereotypes of Ten Occupations 

(N = 124) 
Stereo- 
typy 
Index 


Occupation Stereotype 


School 
Teacher 


well educated, intelligent, 11.7 


tolerant, fairminded, friendly 


Doctor intelligent, efficient, well 
educated, humanitarian, 


practical 


alert, calculating, well 
educated, shrewd, clever 


Lawyer 


fairminded, intelligent, 
open-minded, practical, 
honest 


Arbitration 
Court Judge 


ambitious, industrious, 
practical, efficient, 
progressive 


Factory 
Owner 


ambitious, argumentative, 
power-seeking, talkative, 
evasive 


Politician 


efficient, industrious, 
honest, practical, methodical 


Factory 
Foreman 


aggressive, determined, 
ambitious, argumentative, 
power-seeking 


Trade 
Union 
Leader 
friendly, co-operative, 

honest, imitative, efficient 


Factory 
Worker 


rough, tough, friendly, 
honest, industrious 


Coal Miner 


Note.- ey ape have been listed in order of preference. 
ve 


Only the first adjectives, in order of frequency of listing, 
are shown. The stereotypy index shows the number of adjec 
tives that account for 50% of the votes cast for each occupation. 


strength of occupational stereotypes and or- 
der of preference in the present study. 


Discussion 


Admittedly, the Katz and Braly method 
has short-comings, not the least of which is 
the semantics of the large number of adjec- 
tives in the list. The fact remains, however, 
that whatever degree of validity is attributed 
to the ethnic stereotypes obtained by the 
technique must likewise be attributed to oc- 
cupational stereotypes. The existence of such 
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stereotypes seems to accord equally well with 
commonsense observation, although we have 
no information on their existence in other 
groups. ; 

As with ethnic stereotypes, much research 
will be required before we are in a position 
to check the truth or falsity of occupational 
stereotypes. The information available on 
“occupational personalities” is limited in ex- 
tent and heterogeneous in nature and in the 
methods by which it has been obtained (7). 
It is not in a form suitable for comparison 
with stereotypes of the sort found in this 
and previous investigations. On the whole, 
vocational counselors have focused their at- 
tention on abilities and interests. 

The present study throws no light on the 
factors responsible for the stereotypes. In 
particular, there is no indication of selective 
perceptual distortion arising from political 
group membership, although it is possible 
that distortion might occur in a more sharply 
divided population, such as might be found in 
industry. 

The nature, extent and influence of occu- 
pational stereotypes appear to offer many 
fruitful opportunities for research. The rele- 
vance of such stereotypes to industrial rela- 


tions and personnel management is obvious; 
they are undoubtedly an important factor in 
the way in which members of an occupational 
group are perceived and constitute part of 
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the role expectations for an occupation. It 
seems likely that they play an important part 
in vocational choice. A developmental study 
of the stereotypes held by adolescents of dif- 
ferent ages would produce useful information 
on the process of vocational choice and on 
the formation of occupational stereotypes. 
Such research should not be limited to the 
method of Katz and Braly. 


Received July 1, 1957. 


References 


. Asch, S. E. Forming impressions of personalities. 
J. abnorm. soc. Psychol., 1946, 41, 255-290. 

. Gilbert, A. M. Stereotype persistence and change 
among college students. J. abnorm. soc. Psy- 
chol., 1951, 46, 245-254. 

. Haire, M. Role-perceptions in labor-management 
relations: an experimental approach. Industr. 
and Labor Relat. Rev., 1955, 8, 204-216. 

. Haire, M., & Grunes, W. Perceptual defense proc- 
esses protecting and organizing perception of 
another personality. Human Relat., 1950, 3, 
403-412. 

. Haire, M., & Morrison, F. School children’s per- 
ceptions of labor and management. Unpub- 
lished manuscript, Univer. California, 1954. 

. Katz, D., & Braly, K. W. Racial stereotypes of 
100 college students. J. abnorm. soc. Psy- 
chol., 1933, 28, 175-193. 

. Roe, Anne. The psychology of occupations. 
York: Wiley, 1956. 

. Stagner, R. Stereotypes of workers and execu- 
tives among college men. J. abnorm. soc. 
Psychol., 1950, 45, 743-748. 


New 





Journal of Applied Psychology 
Vol. 42, No. 2, 1958 


Personality Needs of Under- and Overachieving Freshmen * 


G. Gary Gebhart and Donald P. Hoyt 
Kansas State College 


Reviews of the literature on the differential 
personality characteristics of over- and un- 
derachievers have revealed inconsistent find- 
ings (3, 7). Studies in this area have been 
highly varied in approach, which probably 
accounts for much of this confusion. Experi- 
mental weaknesses may also contribute sig- 
nificantly. For example (a) various control 
factors have been neglected: sex differences 
have been confounded (e.g., 12); level of 
academic progress has varied among Ss (e.g., 
1) ; educational-vocational orientation has not 
been controlled (e.g., 1, 8, 10); differences in 
over- and underachievement at various abil- 
ity levels have not been considered (e.g., 11, 
13, 15). (6) The definitions of over- and 
underachievement have been vague (e.g., 2, 
12). (c) Personality tests developed for 
quite different purposes have been used (e.g., 
1, 5). (d) The Ss were representative of 
very limited or unusual populations (e.g., 4, 
14). 

The major purpose of this study was to in- 
vestigate some personality correlates of over- 
and underachievement, while avoiding the ex- 
perimental pitfalls of the earlier studies. This 
attempt to control variables previously neg- 
lected led to three subsidiary purposes of the 
study. These were: to investigate whether 
or not Ss at different ability levels had dif- 
ferent personality needs; to determine whether 
or not the personality correlates of over- and 
underachievement were the same at various 
ability levels; to investigate personality dif- 
ferences between groups having differing vo- 
cational orientations. 


Procedure 
Sample 


The population investigated in this study consisted 
of male freshmen who first enrolled in the Schools 
of Engineering and Architecture or Arts and Sci- 
ences at Kansas State College in the fall of 1956-57. 
A total of 430 Engineering students and 310 Arts 
and Sciences students were considered. 


1 This study is taken from the M.S. thesis of the 
first author (7), done under the direction of the 
second author. 
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Each school group was subdivided into three abil- 
ity levels. Any student with a predicted grade point 
average? (GPA) of less than .70 (on the 3-point 
system) was defined as a low ability student. Those 
with predicted GPA’s over 1.30 were defined as high 
ability students. All others were called average in 
ability. 4 

The Ss were further divided into under- and over- 
achievers. If the student’s obtained first semester 
grades were higher than those predicted, he was 
called an overachiever; if lower, he was called an 
underachiever. 

This division provided 12 groups (2 schools, 3 
ability levels, and 2 levels of achievement). The 12 
groups were then examined individually. In each 
group, the 20 Ss whose obtained GPA was most 
discrepant from that predicted were included in the 
present sample. The mean discrepancy between pre- 
dicted and obtained GPA’s for the 240 Ss thus se- 
lected was 1.01. Group means ranged from .72 to 
1.26.3 


Design 


Scores on each of the 16 variables of the Edwards 
Personal Preference Schedule (EPPS), obtained from 
the freshman testing program, were collected for all 
240 Ss. The study employed a factorial design (2 
schools X 3 ability levels X 2 achievement levels), 
and the statistical tool was the analysis of variance. 
A total of 16 analyses was performed, one for each 
of the 16 EPPS scales. 


Results 
Major Hypotheses 


The major set of hypotheses of the study 
concerned possible differences between over- 
and underachievers. Put in the null form, 
these read: on the variables in question, no 
differences obtain between over- and under- 
achievers. Table 1 summarizes the results of 
these 16 analyses of variance. 

Of the 16 null hypotheses, 7 were rejected 


2 Local studies had established the Pre-Engineering 
Ability Test to be the best predictor in Engineering 
(r = .60) and the A.C.E. to be the best predictor in 
Arts and Sciences (r= .55). Therefore these were 
the two ability tests employed in this study. 

8 Because the experimental design required exactly 
20 Ss in each of the 12 groups, it was necessary to 
include a few students who were “marginal” over- 
or underachievers. Thus, the GPA of seven Ss was 
within 0.5 of the predicted GPA; for the remaining 
233, this discrepancy was at least 0.5. 
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Table 1 
EPPS Scores for Groups of Over- and Underachievers 





Under- 
achievers 
(N = 120) 


Over- 
achievers 
(N = 120) 
Edwards 

Need 





Mean Mean : F 





Ach. 
Det. 
Ord. 
Exh. 
Aut. 


14.4 
12.0 
11.1 
13.8 


16.3 
12.7 


16.341*** 

2.714 
12.3 4.544* 
123 3. 1.467 
13.4 4. — 


Aff. ; 14.8 
Int. . 14.5 
Suc. d ' 10.6 

14.9 


5.559* 
3.975* 
3.231 
1.073 
1.981 


11.555*** 
7.808** 
1.521 


6.104* 





at the 5% level of confidence or beyond. The 
results of these analyses may be summarized 
as follows: (a) Overachievers scored signifi- 
cantly higher on the following scales—Achieve- 
ment, Order, Intraception, and Consistency. 
The mean difference between the two groups 
on Achievement was especially significant. 
(6) Underachievers scored significantly higher 
on the following scales—Nurturance, Affilia- 
tion, and Change. The mean differences be- 
tween the two groups on Nurturance and 
Change were especially significant. 


Subsidiary Hypotheses 


The first set of subsidiary hypotheses con- 
cerned the differences that might ex st be- 
tween ability level groups. In the null form 
these hypotheses read: concerning the vari- 
able in question, no differences obtain arhong 
the High, Average, and Low ability groups. 
The results of the 16 analyses of variance 
are contained in Table 2. 

The null hypotheses were rejected in 9 of 
the 16 analyses. These results may be sum- 
marized as follows: (a) High ability groups 
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scored consistently and significantly higher 
on the Achievement, Exhibition, Autonomy, 
Dominance, and Consistency scales. (+) Low 
ability groups scored consistently and signifi- 
cantly higher on the Deference, Order, Abase- 
ment, and Nurturance scales. 

The second set of subsidiary hypotheses 
was concerned with whether or not the per- 
sonality correlates of over- and underachieve- 
ment were the same at the three ability levels. 
This was tested in the analysis of variance by 
the interaction between ability and achieve- 
ment levels. 

Only 2 of the 16 analyses yielded an F 
large enough to warrant rejection of the null 
hypothesis, which in this case is stated as fol- 
lows: with regard to the variable in question, 
no differences attributable to the interaction 
of ability and achievement obtain among the 
groups. 

The null hypotheses that were rejected 
were found on the Heterosexuality and Con- 
sistency scales. In both cases, .05 > P > .01. 

The interaction found on Heterosexuality 
is illustrated in Table 3. High ability groups 
tended to score higher on this need than 
did low ability groups. Also, underachievers 
tended to score higher than overachievers. 
But low ability overachievers scored higher 
than low ability underachievers. 

The interaction found on the Consistency 
scale is shown in Table 4.\ Overachievers 
scored significantly higher on this scale than 
did the underachievers, and high ability 
groups scored significantly higher than did 
low ability groups. Yet, high ability under- 
achievers scored higher than high ability over- 
achievers. 

The third set of subsidiary hypotheses con- 
cerned the differences between groups with 
different vocational orientations. In the null 
form, the hypotheses may be stated as fol- 
lows: concerning the variable in question, no 
EPPS differences obtain between students in 
the two schools (Engineering and Arts and 
Sciences). 

The null hypothesis was rejected twice. 
Engineering groups scored significantly higher 
on need Endurance (P < .001), while Arts 
and Sciences students scored significantly 
higher on Dominance (P < .05). 
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Table 2 
EPPS Scores for Groups at Different Ability Levels 


High 
(N = 80) 
Edwards -- - 
Need 


15.5 
12.9 
11.9 
14.6 
13.0 


15.4 
14.0 
11.3 


Ach. 
Def. 
Ord. 
Exh. 
Aut. 


Aff. 
Int. 
Suc. 
Dom. 
Aba. 
Nur. 
Chg. 
End. 
Het. 
Agg. 


CS. 


Discussion 


In the interest of brevity, only the findings 
relating to the major hypotheses will be dis- 
cussed. 

Gough (8) has recently published two 
personality scales designed to predict aca- 
demic achievement. These scales are called 
Ac (achievement via conformance) and Ai 
(achievement via independence), implying 
that different personality characteristics may 


Table 3 


Heterosexuality Scores for Over- and Underachievers 
at Three Ability Levels 
(N = 40 in each group) 


Ability 


Average 


Achievement 


High 





Under 


Mean 
SD 


16.7 
6.0 
Over 


Mean 
SD 


Average 
(N = 80) 


Mean SD 


Low 
(N = 80) 
Mean SD F 


14.310*** 
7.372°** 
3.131* 
6.725** 
4.330* 


3.6 3.6 
2.5 13.1 3.2 
4.0 3.4 
3.1 . 3.1 
3.2 . 3.6 


4.3 3.8 
4.7 3.7 
4.1 3.2 
4.1 3.5 
4.2 3.8 


4.1 4.3 
4.3 

5.2 4.2 
6.1 5.3 
4.2 ; 3.6 


1.8 2.1 


1.280 
7.572*** 
4.702** 
3.129* 
1.689 


1.344 
1.898 


5.072** 


be associated with the same behaviors (aca- 
demic achievement) but for different reasons. 

While little support for Gough’s suggested 
dichotomy was found in this study (neither 
Deference nor Autonomy was associated with 
under- or overachievement), the suggestion 
regarding a variety of personality patterns 
among achievers and non-achievers does ap- 
pear to have merit. 

On the basis of the present study, three 


Table 4 
Consistency Scores for Over- and Underachievers 
at Three Ability Levels 
(N = 40 in each group) 





Ability 


Achievement Average Low 


Under 


Mean 10.3 


: 9.7 
SD 6 1.8 


2.6 


Over 


11.3 10.9 
2.0 1.7 


Mean 
SD 











128 


different patterns of overachievement can be 
hypothesized: (a) overachievement associated 
with a drive to complete (Achievement); (5) 
overachievement associated with a drive to 
organize or plan (Order); and (c) over- 
achievement associated with intellectual curi- 
osity (Intraception). Similarly, two patterns 
of underachievement may be hypothesized: 
(a) that associated with a need for variety 
(Change), wherein academic studies may ap- 
pear boring and routine; and (4) that asso- 
ciated with social motives (Affiliation, Nur- 
turance), wherein friendship may be placed 
above scholarship. The fact that the scales 
involved do not intercorrelate significantly 
(6) supports the notion that several rela- 
tively distinct patterns, rather than a single 
pattern, are involved. 

Further studies to test these hypotheses 
would seem worth while. A pattern analysis 
of the EPPS or an intensive clinical study 
of under- and overachievers would probably 
throw additional light on this problem. 


Summary and Conclusions 


The major purpose of the study was to in- 
vestigate relationships of scores on the EPPS 
to under- and overachievement. Minor pur- 
poses were to study EPPS differences between 
schools and between ability levels, and to in- 
vestigate interactions among these three fac- 
tors. 

A sample of 240 freshman men at Kansas 
State was chosen such that 20 were included 
in each of 12 groups. The groups were classi- 
fied as to achievement (under or over), abil- 
ity (high, average or low), and school (En- 
gineering or Arts and Sciences). 

Within the limits of the sample employed, 
the following conclusions were reached: 

1. Overachievers scored significantly higher 
than underachievers on the Achievement, Or- 
der, Intraception and Consistency scales, and 
significantly lower on the Nurturance, Affilia- 
tion, and Change scales. 

2. High ability Ss score significantly higher 
than those of low ability on the Achievement, 
Exhibition, Autonomy, Dominance, and Con- 
sistency scales, and significantly lower on the 
Deference, Order, Abasement, and Nurtur- 
ance scales. 

3. Engineering students scored significantly 
higher than Arts and Sciences students on the 
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Endurance scale, and significantly lower on 
the Dominance scale. 

4. Two interactions between ability and 
achievement levels were found, one on the 
Heterosexuality scale and the other on the 
Consistency scale. 

5. Hypotheses regarding need patterns of 
under- and overachievers were developed. 


Received July 2, 1957. 
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The studies of radiotelegraphy by Bryan 
and Harter (1, 2) probably represent the first 
of all attempts to apply quantitative, scien- 
tific procedures to the systematic study of 
skill performance: Since this pioneer study, 
skill in radiotelegraphy has been examined 
from a variety of viewpoints. West (13), as 
well as Windle, Sidman, and Keller (15), 
have provided recent comprehensive reviews 
of the extensive variety of studies on learn- 
ing and training; Taylor (10), and more 
recently Craeger (3), have reviewed studies 
on selection procedures for radiotelegraphers. 
The primary task investigated in most of 
these previous studies was skill in receiving 
International Morse Code. 

Despite the variety of studies on Morse 
Code reception, there is actually very little 
known about the nature of the fundamental 
abilities underlying proficiency in this skill. 
In general, evidence in the area of selection 
indicates that printed, academic type tests 
achieve only low predictions of code profi- 
ciency, and certain auditory tests yield 
higher predictions. Beyond this we know 
little about the fundamental aptitudes in- 
volved. A recent study by Fleishman (4), 
through the inclusion of a large variety of 
auditory tests, sought to specify more pre- 
cisely the kinds of auditory measures provid- 
ing the best predictions of subsequent code 
proficiency. Although 10 different auditory 


1This research was performed while the writers 
were with the Air Force Personne! and Training Re- 
search Center, Lackland Air Force Base, Texas, in 
support of Project No. 7706, Task No. 27002. Per- 
mission is granted for reproduction, translation, pub- 
lication, use, and disposal in whole or in part by or 
for the U. S. Government. 


tests were evaluated, it was found that a com- 
bination of two tests (a measure of initial 
code learning and a measure of rhythm dis- 
crimination) yielded maximum prediction and 
beyond this, addition of other measures pro- 
duced no increase in prediction. This was 
true even though a variety of the remaining 
auditory tests possessed significant individual 
validities. It is obvious that in order to ef- 
fect improvements in current selection pro- 
cedures, we need to know more about the 
fundamental abilities involved in such tests 
as well as in our criteria of code-receiving 
proficiency. Such knowledge would have im- 
plications, not only to problems of code pro- 
ficiency, but also to the broader problems 
concerning the nature of auditory-perceptual 
processes. 

The present study is an attempt to obtain 
this kind of information through the applica- 
tion of factor analysis techniques to both 
aptitude measures and proficiency criteria. It 
represents a follow-up to our earlier study 
(4, 6). The present study utilizes a different 
sample and improved measures of certain of 
the auditory tests previously found valid by 
Fleishman (4, 6). In addition, we included 
certain printed tests in aptitude areas not 
previously evaluated against code proficiency. 


Method 


A battery of 14 specifically selected or designed 
tests was administered to 310 airmen prior to their 
entrance into radio operator training. Five of these 
tests were auditory tests (recorded on high fidelity 
tape); the remaining nine were printed tests. All of 
these were hypothesized to measure abilities relevant 
to success in the learning of Morse Code. A brief 
description of each test variable follows: 
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Aural Tests 


1. Rhythm Discrimination (Form D).2 This is 
our fourth revision of an adaptation of the rhythm 
subtest of the Seashore Measures of Musical Talent 
(4, 6). The examinee hears a series of 70 pairs of 
rhythm patterns (beats within each pair presented 
in rapid succession). After each pair, he must mark, 
on an IBM answer sheet, under the S if the pat- 
terns in each pair are the same, under the D if they 
are different, and under the (?) if he cannot decide. 

2. Dot Perception Test (Form C). This is our 
third revision of a test described previously (4, 6). 
The examinee hears a series of code signal groups, 
consisting of rapid patterns of “dots” and “dashes.” 
Each signal group is about the same over-all length 
in terms of time, but the internal arrangement of 
the “dots” and “dashes” varies from item to item. 
The “dots,” however, always come in a single series, 
at the end or beginning of the signal group. For 
each group (items) the examinee simply marks (on 
an IBM answer sheet) the number of “dots” pre- 
sented (1, 2, 3, 4, or 5) within each group. The 
speed of transmission increases at intervals through 
the test. 160 items. 

3. Copying Behind. The examinee hears groups 
of numbers called out in rapid succession (e.g., 4-2- 
5-1-3). His task is to mark under the proper num- 
ber, in turn, on an IBM answer sheet, attempting to 
keep up with the pace set by the narrator. The pace 
increases at intervals through the test. 240 items. 

4. Hidden Tunes This test has been described 
earlier by White (14). The examinee hears a series 
of short tunes presented in pairs. The second tune 
is always longer than the first in each pair, and the 
examinee must determine if the second tune in- 
cluded the first. For each pair he marks under yes 
or under mo on his answer sheet. 50 items. 

5. Army Radio Code Test (ARC). This test (3, 
4, 6) is designed to measure the speed with which 
the examinee can learn certain actual Morse Code 
signals (for I, N, and T). About 25 min. of the 
test involves the practice, with knowledge of results, 
of these signals under increasingly higher rates of 
speed. For the test period the examinee marks un- 
der the I, N, or T on the special IBM answer sheet 
as each signal (dot-dash, dash-dot, dash) is pre- 
sented in rapid succession. 150 test items. 


Printed Tests 


Variables 6 through 10 are tests developed by 
Thurstone to measure “closure” factors.¢ Variables 
6 through 8 are reported to be reference tests of a 


2 The writers are indebted to Harold Seashore and 
The Psychological Corporation for permission to 
modify the Seashore test for our purposes. 

8 The writers are indebted to Edward L. Walker 
and Benjamin White for providing a copy of this 
test 


4 Permission to reproduce these tests for our pur- 
poses was granted by L. L. Thurstone. 
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“Speed of Closure” factor, defined as “the ability to 
unify an apparently disparate perceptual field into 
a single percept” (7). It seemed a reasonable hy- 
pothesis that receiving Morse Code may involve such 
an ability. At least it appears that a stream of 
stimulus signals must somehow be unified, given 
structure, and broken into the appropriate units. 
The tests representative of this ability were: 

6. Four-letter Words. Twenty-two 46-letter lines 
of capital letters are presented on a printed page. 
The task is to circle all the four-letter words which 
can be found spelled out in this array. Thus, the 
examinee scans along these lines and every time he 
finds four letters in sequence which spell a word, he 
circles them. 

7. Mutilated Words. Each item presents a word 
with parts of each letter missing. The examinee 
writes out the full word in an adjacent space. 

8. Gestalt Completion. This is Thurstone’s adap- 
tation of the Street Gestalt Completion Test. Draw- 
ings are presented which are composed of black 
blotches representing only suggestive parts of the 
objects portrayed. The examinee attempts to “make 
sense” out of these and for each drawing writes 
down the name of the object. 

Variables 9 and 10 are reported as reference tests 
of a “Flexibility of Closure” factor, which is defined 
as “the ability to keep in mind a definite configura- 
tion so as to identify it in spite of perceptual dis- 
tractions” (7). This factor differs from the “Speed 
of Closure” factor in that the examinee knows the 
particular configuration he is looking for. 

9. Designs. Three hundred geometrical designs are 
presented, in 40 of which a sigma (3) is embedded. 
The task is to mark the designs in which the sigma 
occurs. 

10. Concealed Figures. This is Thurstone’s adap- 
tation of the Gottschaldt Figures Test. The task is 
to select the one of five given geometrical figures 
that is contained in a more complex geometrical 
figure. 

11. Marking Accuracy. The task is simply to 
mark a standard IBM answer sheet in which one of 
the five alternatives to each item has been over- 
printed with a small circle. The examinee’s task is 
merely to mark the answer sheet as rapidly as pos- 
sible under the indicated circles, going from one 
item to the next. In a sense, this is the visual coun- 
terpart of the aural Copying Behind Test described 
above. 

12. Word Knowledge. 
vocabulary test. 

13. Background for Current Affairs. This is an 
informational test covering current, recent, and his- 
torical events. This test, together with the Word 
Knowledge Test, has consistently defined a verbal 
factor on previous Air Force studies. 

14. Pattern Comprehension (9). A series of draw- 
ings require the examinee to visualize the relation- 
ships between components of solids and their un- 
folded flat projections. 
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Note.—Values above the diagonal are the obtained (restricted) correlations; values below the diagonal are corrected for 


restriction of range. Decimals omitted. 


The Criterion of Code Proficiency 


15. Number of Classroom Days to Attain a Code 
Receiving Speed of 14 Groups Per Minute. All stu- 
dents in the course receive frequent code checks un- 
der comparable conditions. Students are held back 
until the checks at each code speed are passed satis- 
factorily. Time to reach given code speeds repre- 
sents a uniquely unambiguous criterion of proficiency. 
(The advantages of a “time to learn” criterion in 
this context have been described recently by Gordon 
[8].) The criterion chosen was that of 14 groups 
per minute which is the stage at which the student 
must qualify to be admitted to further phases of 
radio operator training. There are wide individual 
differences, even after selection, in the amount of 
time needed to reach this criterion. Practically all 
failures in the course are attributed to difficulties in 
receiving Morse Code, rather than to difficulties in 
sending code or in other academic subjects. 


Data Analysis Procedures 


The intercorrelations among these 15 variables 
were obtained. These are presented in the upper 
half of Table 1. Since the criterion variable is in 
terms of time to attain a given proficiency level, the 
validities obtained were uniformly negative. These 
signs were reflected for our purposes. Thus, the 
positive validities in Table 1 indicate that a particu- 
lar test is positively related to “good” criterion per- 
formance (low amount of days to reach 14 GPM). 

Since all of the Ss in our sample had been selected 
for training on the basis of their scores on the Air- 
man Classification Battery, it was necessary to cor- 


rect these correlations for restriction of range. The 
basis for selection was the Radio Operator Aptitude 
Index (ROAI), a composite score derived from five 
classification tests. Corrections of all the obtained 
correlations were made in accordance with the pro- 
cedures outlined by Thorndike (11). The bottom 
half of Table 1 presents the corrected coefficients. 
These are the coefficients utilized in the factor 
analysis. 

Six factors were extracted from the correlation 
matrix using Thurstone’s Centroid Method (12). 
Orthogonal rotations were accomplished, using Zim- 
merman’s graphical procedure (16), until simple 
structure appeared to be closely approximated. The 
sixth centroid factor extracted was considered to 
contain only residual variance and was not rotated. 


Results 


Table 2 presents the centroid and rotated 
factor matrices. The rotated factors were 
interpreted for psychological meaningfulness. 
Loadings above .30 are listed in turn for each 
factor. 


Factor I is interpreted as Visualization. 
No. 
14 Pattern Comprehension 
10 Concealed Figures 
9 Designs 
Gestalt Completion 
11 Marking Accuracy 


Variable Loading 
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Table 2 


Centroid and Rotated Factor Matrices 








Centroid Loadings* 


Factors 





Variable 





. Rhythm Discrimination 
. Dot Perception 

. Copying Behind 
Hidden Tunes 

. Army Radio Code 

. Four-letter Words 

. Mutilated Words 

. Gestalt Completion 

. Designs 

. Concealed Figures 

. Marking Accuracy 

. Word Knowledge 

. Backgr. Curr. Afirs. 

. Pattern Comprehension 
. Proficiency Criterion 


Ya?/k 


CHONIAMNP WHE 


Rotated Loadings* 
Factors* 


IV Vv VI> 
Cs APS Res 


00 12 
32 —19 
54 07 
—01 —04 
51 —06 
05 

16 

16 

—23 


17 
—06 
08 





* Decimals omitted. 

> Factor VI—not rotated. 

© Factors are identified as follows: I—Visualization; IT 
Closure; V—Auditory Perceptual Speed ; VI—Residual. 


Although three of these variables have been 
listed (7) as definers of “closure” factors, in- 
terpretation as the better established visuali- 
zation factor appears much less strained. As 
we shall see, a separate closure factor was 
identified in our analysis, and the three 
“closure” type tests on the present factor do 
not represent the same closure factor accord- 
ing to Thurstone’s definitions (7). Thus, 
Designs and Concealed Figures represented 
his “Flexibility of Closure” factor, and Ge- 
stalt completion is purported to be a meas- 
ure of “Speed of Closure.” Furthermore, the 
main definer of Factor I, “Pattern Compre- 
hension,” has been found repeatedly (5, 9) 
to define Visualization. Tests of this factor 
seem to require mental manipulation of visual 
objects, in which it is necessary to move, 
twist, turn, or invert one or more parts of a 
configuration (in imagination) and to recog- 
nize the new position, location, or changed ap- 
pearance after the modification. This seems 
to fit Pattern Comprehension as well as the 
Thurstone closure tests, all of which involve 


Verbal Ability; I1I—Auditory Rhythm Perception; IV 


-Speed of 


shapes. The loading of Marking Accuracy 
on this factor is not explainable in terms of 
this definition, but it is to be noted that there 
is quite a gap between the main definers of 
this factor (loadings .50O-.69) and this latter 
test (loading .32). Whatever the nature of 
this factor, we find that the criterion of code 
proficiency is not loaded on it. 


Factor II is identified as the Verbal A bility factor. 


Variable Loading 


Word Knowledge 76 
Background for Current Affairs 74 
Gestalt Completion 40 
Mutilated Words 38 


The main definers are those which have 
defined the Verbal Ability factor in many 
previous studies (e.g., 5, 9). Loadings of 
Gestalt Completion and Mutilated Words are 
consistent with this interpretation. The for- 
mer requires the spelling out of words and 
the latter, the recognition of words. The 
test Four-letter Words had the next highest 
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loading (.23) on this factor. Again, the cri- 
terion of code proficiency does not load on 
this factor. 


Factor III is interpreted as Auditory Rhythm Perception. 


Variable Loading 
Hidden Tunes 62 
Dot Perception (Form C) 58 
Rhythm Discrimination (Form D) 56 
Copying Behind 32 
ARC 30 
Criterion 31 


The factor is common to all of the aural 
tests included and to none of the printed 
tests. Perception of rhythm appears to be 
the most critical feature of the three tests 
with loadings above .50, in all of which the S 
receives his stimulus signals in groups. This 
factor appears general to auditory tasks re- 
gardless of the kind of auditory signal in- 
volved (e.g., code signals, tunes, beats, call- 
ing of numbers). Of special interest is the 
loading of the Morse Code proficiency cri- 
terion on this factor. 


Factor IV is identified as Speed of Closure. 


No. Variable Loading 
7 Mutilated Words 55 
6 Four-letter Words 54 
: Copying Behind 31 

15 Criterion 44 


The main definers of this factor are the 
tests designed by Thurstone as measures of 
the Speed of Closure factor. In Mutilated 
Words and Four-letter Words, the S does not 
know the stimulus unit that he is looking for, 
but must “organize” the stimulus material 
into meaningful units. It is to be noted that 
the “closure” tests not imposing this require- 
ment do not appear on this factor. As a 
methodological point, it should be noted that 
the inclusion of other verbal tests in the 
study made it possible to partial out the ver- 
bal variance in these tests and in the cri- 
terion, yielding a more precise estimate of the 
“closure” variance present. Of special inter- 
est is the high loading of the criterion vari- 
able on this factor. This is consistent with 
our original hypothesis that in receiving 
Morse Code, a major task is to group an ap- 
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parently disparate auditory perceptual field 
into meaningful units. The finding that this 
kind of closure factor is common to auditory, 
as well as to visual, perceptual tasks would 
appear to have important general significance. 


Factor V is identified as an Auditory Perceptual 
Speed factor. 


No. Variable Loading 
Copying Behind 4 
Army Radio Code 51 
Dot Perception 32 

10 Concealed Figures 31 

15 Criterion 31 


The three auditory tests loaded on this fac- 
tor are the ones which emphasize speed the 
most. Copying Behind was especially de- 
signed as a speed measure, and both the Army 
Radio Code Test and the Dot Perception Test 
carry the subject to increasingly high speed 
levels. In each test, it is necessary for the S 
to respond almost immediately or he will miss 
the next stimulus presented. In most cases 
he is responding to an earlier stimulus at the 
same time that new stimuli are being pre- 
sented. The two auditory tests, Rhythm 
Discrimination and Hidden Tunes, which do 
not appear on this factor, allow the S more 
time to make a response after the presenta- 
tion of each stimulus pair. 

The low, but significant, loading of the 
Concealed Figures Test is, of corse, not en- 
tirely consistent with our interpretation of 
this factor as “Auditory Perceptual Speed.” 
It is possible that the presence of Concealed 
Figures suggests that this factor is the same 
as the Perceptual Speed factor found among 
certain kinds of printed tests involving rapid 
discrimination of visual detail (7, 9). How- 
ever, there is no way to assess this in the 
present study as no reference tests of Per- 
ceptual Speed have been included. The pos- 
sibility that Perceptual Speed may extend to 
auditory and visual perception is worthy of 
future study. 

The Marking Accuracy Test, which sam- 
ples sheer speed of marking in the slots of an 
answer sheet, has a loading of only .28 on 
this factor. Since this test requires responses 
identical to the Copying Behind Test, but in- 
volves no appreciable stimulus discrimination, 
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it would appear that emphasis on this factor 
is on speed of stimulus discrimination and not 
on speed of response. 

As can be seen, the criterion of Morse Code 
reception is loaded on this factor. This seems 
completely consistent with our factor descrip- 
tion inferred from the loadings of the test 
variables. 


Discussion 


These results suggest that the following 
three ability factors contribute to individual 
differences in proficiency in Morse Code re- 
ception. 

1. Speed of Closure. The ability to unify 
an apparently disparate perceptual field into 
meaningful component units. 

2. Auditory Rhythm Perception. The abil- 
ity to discriminate rhythmic patterns inher- 
ent in particular auditory stimulus groups. 

3. Auditory Perceptual Speed. The speed 
with which an individual can discriminate in- 
dividual auditory stimulus signals presented 
in rapid succession. 

From the communality estimate (Table 2) 
it was also shown that the factors identified 
in the present study accounted for 43% of 


the variance in the code proficiency criterion. 
This suggests additional factors may yet be 
found through the inclusion of additional 


kinds of ability variables. However, a com- 
munality of this magnitude implies a possible 
multiple R of .66 may be achieved (\/h*) 
using the kinds of predictor variables investi- 
gated here. This is higher than has been 
achieved in previous studies of radioteleg- 
raphers in this operational training setting 
(3, 4, 6). Multiple correlational studies are 
now underway, using a wider variety of pre- 
dictor and criterion variables. 

The present results confirm earlier indica- 
tions that aural tests are likely to give better 
predictions of code proficiency than most 
printed test variables. These results help 
rationalize this empirical finding in terms of 
underlying abilities. _A major contribution 
of the present study is the finding of a 
“closure” factor in code proficiency. Meas- 
ures of this domain have not been included 
in previous studies on the selection of radio- 
telegraphers. 
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It should be stressed that the relative im- 
portance of the factors identified may not 
hold for very early or very advanced levels 
of proficiency. Studies are in progress on 
possible changes in such patterns as a func- 
tion of practice. It is possible that studies 
of this type, done at success levels of profi- 
ciency, may throw some light on the proc- 
esses involved in the learning of Morse Code. 

Methodologically, these results indicate that 
properly designed factor analyses, including 
both predictor and criterion variables, may 
provide important leads regarding the fun- 
damental abilities underlying proficiency in 
complex jobs. 


Summary 


Fourteen auditory and printed aptitude 
measures were administered to students prior 
to entry into training for radiotelegraphy. 
These 14 measures, together with a criterion 
of proficiency in learning to receive Morse 
Code, were subjected to factor analysis study. 
Five factors were identified as Visualization, 
Verbal Knowledge, Speed of Closure, Audi- 
tory Rhythm Perception, and Auditory Per- 
ceptual Speed. Three of these, Auditory 
Rhythm Perception, Auditory Perceptual 
Speed, and Speed of Closure, were found to 
contribute to the criterion of subsequent code 
proficiency. 


Received July 12, 1957. 
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Recovery From Unusual Aircraft Attitudes Under the 
Influence of Vertigo ' 


J. E. Conklin and O. H. Lindquist 


Minneapolis-Honeywell Aeronautical Division 


The conventional aircraft attitude indicator 
is termed an inside-out display since it repre- 
sents the artificial horizon as the moving ele- 
ment, similar to what the pilot would per- 
ceive in contact flying. An outside-in dis- 
play represents the same information by 
presenting the artificial horizon as the sta- 
tionary element and depicting roll and pitch 
deviations by a moving drone. Research has 
demonstrated that the latter presentation of 
aircraft attitude has certain advantages over 
the more “realistic” indicator (1, 2, 3). That 
the moving drone concept, however, is not 
currently used is due to a concern over the 
transfer effects between instruments. Recent 
flight tests have shown that transfer from a 
moving horizon to a moving drone display is 
quickly achieved. It is believed, however, 
that unless all aircraft are equipped with a 
moving drone display the transfer effect from 
this display to the conventional one may be 
negative, and a possible cause of accident; 
particularly, when the pilot is faced with a 
critical situation such as disorientation due to 
vertigo. 


Method 


The study was designed to measure recovery per- 
formance with two contrasting concepts of attitude 
indication under the influence of vertigo. It was 
also designed to test whether extensive training with 
the moving drone display interferred with recovery 
performance with the moving horizon or conven- 
tional indicator. 

Apparatus. The apparatus consisted of (a) the 
moving horizon and moving drone displays with as- 
sociated circuits, (b) analog computers to simulate 
aircraft dynamics, (c) problem generator for train- 
ing purposes, (d) joy-stick control mounted on a 
chair modified for rotation and equipped with a 
safety belt and foot rest, and (e) a two-channel 
Brush recorder. 


1 This study was conducted by the research depart- 
ment of the Minneapolis-Honeywell Aeronautical Di- 
vision. The authors are indebted to Alex Weisz for 
stimulating this research. The pilots, Jim Bradford 
and Carl Gruber, are also thanked for their partici- 
pation in this study. 


Subjects. Two experienced pilots served as sub- 
jects for this study. 

Procedure. The experiment comprised six experi- 
mental sessions for each pilot. The first and last 
hour of each session was devoted to training with 
the moving drone display with one exception; ie 
on the first day, only the moving horizon indicator 
was practised. Between the two training periods, 
the pilot’s ability to recover from unusual attitudes, 
subsequent to semicircular canal stimulation, was 
tested with both indicators. 

Each training period consisted of a continuous 
tracking task for 20 two-min. trials with a one-min 
rest between trials. The recovery tests consisted of 
16 items per indicator, comprised of a displacement 
in pitch and roll, chosen randomly. The pilot was 
instructed to return the display elements to straight 
and level flight as rapidly as possible following the 
rotation. 

Preceding each recovery trial, the pilot was ro- 
tated to induce vertigo. Eight combinations of head 
positions and rotation directions were used. Rota 
tions to the right and left were given on alternate 
2-trial intervals for each head position. The pilot 
was rotated for five turns with the head in the up- 
right position, four turns with the head on the right 
or left shoulder, and three turns with the head bent 
forward. Rotation speed was approximately 180 
per second. 

The order of display presentation during the test 
period was counterbalanced in order to eliminate the 
bias of sequential effects of learning and semicircular 
stimulation. 

Recovery performance in pitch and roll was re- 
corded on a two-channel Brush recorder. From 
these records, total time to recover and reversal 
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Fic. 1. Per cent reversal errors. 
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Table 1 


Comparison of Reversal Errors During Recovery 
with the Moving Drone and Moving 
Horizon Attitude Indicators 


Sessions 


Horizon Drone Probability 


1-2 14 O58 
21 001 


3-4 6 016 
12 001 


5-6 1 
13 001 


errors in pitch and roll were obtained. Time scores 
were subjected to an analysis of variance. Reversal 
errors were analyzed for pitch and roll separately, 
employing the binomial expansion to determine the 
probability that the observed differences between the 
two indicators occurred by chance 


Results 


Figure 1 shows the percentage of reversal 
errors for both indicators in pitch and roll. 
It is noted that reversal errors were absent 
with the moving drone display after the sec- 
ond experimental session. Improvement in 
recovery performance is also observed with 
the moving horizon indicator for the roll di- 
mension. Recovery in pitch, however, was 
not significantly different from the first to the 
last day. The total number of reversal errors 
for the outside-in display was eight, as com- 
pared to 67 for the inside-out presentation 
(see Table 1). 

Recovery time data are given in Fig. 2. 
Improvement is noted with both displays, but 


RECOVERY TimE (SEC) 


~ 4-4 
4 5 


ATTITUDE RECOVERY TESTS 


Average time to recover from unusual 
attitudes. 


the overall superiority is in favor of the mov- 
ing drone instrument. Improvement in re- 
covery time is also a significant effect (see 
Table 2). 


Discussion and Conclusions 


That training with the outside-in display 
for attitude indication did not interfere with 
performance on the inside-out display is 
clearly indicated by the results of this study. 
In addition, recovery performance with the 
moving drone display under the influence of 
vertigo was superior to the moving horizon 
indicator on the very first day of the experi- 
ment when neither pilot had previously ex- 
perienced the former instrument. Improve- 
ment in recovery performance was observed 
with both indicators from the first to the 
sixth experimental session. From this result, 
it may be speculated that the estimated acci- 
dent rate caused by pilot disorientation due 
to vertigo can be minimized if training pro- 
cedures for pilots included recovery experi- 
ence under induced vertigo similar to pro- 
cedures of this study. 


Table 2 


Summary of Analysis of Variance for Recovery Time Data (2 pilots) 


Source df 


Indicators (I) 
Sessions (S) 


18.2659 
353.6962 
(403.5252) 

31.5631 


Within cells 741.9825 


Total 


Sums of Squares 


Mean Squares 


18.2659 
70.7392 


6.3126 
1.9946 
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It can be concluded that (a) the outside-in 
presentation leads to fewer misinterpretations 
of aircraft attitudes than the conventional 
display and (6) negative transfer effects be- 
tween instruments are either absent or negli- 
gible. 


Received August 15, 1957. 
Early Publication. 
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A Simple Method of Recording Paired Comparisons 


Edward N. Hay 


Philadelphia 


There are many situations in which the ad- 
vantages of paired comparison outweigh the 
disadvantage of the greater amount of labor 
required. Arranging data in rank order, for 
example, is not laborious, but the ranks as- 
signed to items that are of the same or nearly 
the same value are not always easy to decide, 
because a number of items must be compared 
simultaneously or compared in turn, with 
check and recheck. It is like juggling five or 
six Indian Clubs at once. 

Paired comparison, however, reduces the 
process to a series of simple judgments of 
one item against another, with never more 
than two things involved in each comparison. 
There are, of course, more comparisons to 
make, but all except a few can be quickly de- 
cided. Only when each item of a pair is of 


Job Title 


| Amortization Schedule 
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nearly the same value is any great delibera- 
tion necessary. This simplification of judg- 
ments usually promotes greater accuracy 

However, there are many judgments to re- 
cord and summarize, which is inconvenient. 
This paper describes a way of recording the 
comparisons which is easy to do and which 
provides a clear and permanent record. It 
also furnishes a ready means of checking the 
arithmetical accuracy of the work. 

Figure 1 illustrates a simple way of record- 
ing the 153 comparisons necessary to deter- 
mine the order of rank of 18 items. The data 
relate to the amount of “know-how” required 
to perform acceptably the duties of 18 jobs 
which were being evaluated according to the 
Guide Chart-Profile Method (1). 

In Fig. 1, it can be seen that there are two 
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cells in either of which the result of compar- 
ing any two jobs can be entered by tally. 
The tally is to be entered, in the case of each 
pair, in the line opposite the name of the job 
thought to rank highest. For example, the 
first two jobs in the table are No. 84-5 Amor- 
tization Schedule Clerk No. 2 and No. 93-6 
Letter Writer. After comparing these jobs, it 
was Clear that the Letter Writer required the 
greater know-how, so a tally was placed on 
the line opposite that job name in the column 
under the other job, Amortization Schedule 
Clerk No. 2. 

From this description it can be seen, after 
all 153 comparisons have been made, that the 
job having the greatest number of tallies on 
the horizontal line to its right will be the 
highest ranking job of the group of 18 jobs. 
If any two or more jobs have the same num- 


Edward N. Hay 


ber of tallies then they can be considered to 
have been ranked equally. 

In the case illustrated in Fig. 1 there were 
8 judges. Their opinions were pooled by 
adding all their tallies, and the final rank of 
jobs was taken from those totals. 

The only check for accuracy that is nec- 
essary is to be sure that the grand total 
of all tallies is 153. For any number of 
items this total is found from the formula 
[N(N —1)/2]. In the case at hand, NV = 
18 and therefore (NV — 1) = 17 and (18 x 
17) /2 is 153. 
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